root@0fce263:~# balena ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
6d544ea7a5a7 e6ad3a09401b "/usr/bin/entry.sh /…" 12 days ago Up 12 days solmon_2660289_1512247
97b3e5978d72 balena/rpi-supervisor:v10.8.0 "./entry.sh" 3 months ago Up 12 days (healthy) resin_supervisor
I don’t understand how the container “uptime” can be greater than the device “uptime”.
Then, looking at the journal logs I see the following snippet:
It does seem the device wasn’t alive between Sep-06 and Sep-18, otherwise, we should be seeing some supervisor/engine logs. What’s strange is that there are no logs of the device either booting (if it was turned off) or recovering (if it was stuck somehow).
I’ve reached out to the engine team to see if they have any ideas about why the container uptime might be higher than the device uptime.
One possible explanation is that the system booted with the last “good” time (Sep 06) before syncing and setting the correct time (Sep 18). If the containers are started before time synchronization occurs in the boot process then the start date of the containers will be Sep 06. Therefore, if you run balena ps after the time has been updated the system will think they have been running for 12 days.
That’s an interesting idea. However, we just saw this same problem in the field where the device went offline for 8 hours and the logs were completely missing, even boot logs. The device had been working properly and the time was right before it went offline.
Hi there, is it possible these devices are losing network connectivity intermittently and continuing to run offline? Also, what type of hardware is this running on? Is it possible the device is being put to sleep or quiesced like a virtual machine?
@ab77 I don’t think the devices are being put to sleep in any manner that I know of. This is a pi0w device. Unfortunately we’re blind to what’s going on because the logs disappear, and as pointed above by @ntzovanis , weird that there is no bootup or recovery information in the logs
Just as a sanity check, if the device is not configured for persistent logging there wouldn’t be any logs from previous boots. Can you double check that you have persistent logging enabled for the device(s) in question?
Hi, the device did have persistent boots. Note that in the log provided in the post, the issue is missing logs between Sep 6 and Sep18. I suspect if there was a persistent logging issue, we wouldn’t see the Sep6 logs either.
Hi there – taking another look at the log you orginally posted, I notice that chronyd makes a rather large adjustment to the system clock:
Sep 06 11:50:01 0fce263 chronyd[776]: 2020-09-06T11:50:01Z Source 2606:4700:f1::123 replaced with 51.255.197.148
Sep 06 11:50:02 0fce263 6d544ea7a5a7[1458]: [157B blob data]
Sep 06 11:50:08 0fce263 chronyd[776]: 2020-09-06T11:50:08Z Selected source 51.255.197.148
Sep 18 14:50:08 0fce263 chronyd[776]: 2020-09-06T11:50:08Z System clock wrong by 1047600.109068 seconds, adjustment started
Sep 18 14:50:08 0fce263 chronyd[776]: 2020-09-18T14:50:08Z System clock was stepped by 1047600.109068 seconds
If my math is right, the discrepancy that chronyd notes (1,046,600 seconds) is just about 12 days and 3 hours – which in turn matches the jump in the logged dates (Sep 6, 11:50 -> Sep 18, 14:50). Given that, I’m inclined to agree with the original suggestion: that the system clock came up incorrect; balena engine recorded that incorrect start date; chronyd corrected the time; and now it looks like the containers have been up longer than the device. This seems even more likely if your device does not have an RTC module.
I would speculate that the incorrect time came from the chrony drift file; if this file were saved (which happens once per hour), then the device lost power for some time, then got power back and booted, it could explain the logs you’re seeing.
If I understand correctly, you mentioned another device where you saw similar behaviour. If you’re able to provide info on that device, we can try to determine what happened with it.