I had to power down one of my balenasense devices on my starter plan a bit ago. I noticed that there was a “shut down” option in the device UI in BalenaCloud, so I tried that and noted that it powered off successfully.
However, upon attempting to restart the device and begin collecting data again, I noticed that the device shows as “Online” but the applications were not starting and could not be restarted. I ran diagnostics and noticed that the Supervisor process was not started and it’s not clear that there is a way to start that process either from the balena CLI, from the device’s terminal (connected to Host OS), or via console (keyboard + screen).
The device in-question is a Raspberry Pi Zero W. Any thoughts or ideas? Is this a bug or a known issue?
What version of balenaOS are you running?
Regarding the supervisor, it is very uncommon that it needs to be manually restarted as it usually recovers automatically. To obtain more information about the failure please check its logfile with journalctl -b -a -f -u resin-supervisor from the hostOS.
Unfortunately the only method to correct the situation was to completely destroy the application, re-flash the device’s storage, and rebuild/redeploy. If I get time this weekend, I’ll try to replicate the issue.
let me please re-open this thread. I’ve faced exactly the same issue: Two balenasense app devices (Raspberry Zero W + balenaOS 2.48.0+rev1). After powerloss device (any of two) shows only Supervisor starting in logs and nothing else happens. I did reproduce this issue a few times and the only way to correct it is to re-flash the device.
Please find logfile from the hostOS:
root@b0ecd1d:~# journalctl -b -a -f -u resin-supervisor
– Logs begin at Fri 2020-01-31 15:19:41 UTC. –
Jul 06 04:49:18 b0ecd1d resin-supervisor[21400]: activating
Jul 06 04:49:18 b0ecd1d systemd[1]: resin-supervisor.service: Control process exited, code=exited, status=3/NOTIMPLEMENTED
Jul 06 04:49:18 b0ecd1d systemd[1]: resin-supervisor.service: Failed with result ‘timeout’.
Jul 06 04:49:18 b0ecd1d systemd[1]: Failed to start Balena supervisor.
Jul 06 04:51:21 b0ecd1d systemd[1]: resin-supervisor.service: Start-pre operation timed out. Terminating.
Jul 06 04:51:21 b0ecd1d systemd[1]: resin-supervisor.service: Control process exited, code=killed, status=15/TERM
Jul 06 04:51:22 b0ecd1d resin-supervisor[22302]: deactivating
Jul 06 04:51:22 b0ecd1d systemd[1]: resin-supervisor.service: Control process exited, code=exited, status=3/NOTIMPLEMENTED
Jul 06 04:51:22 b0ecd1d systemd[1]: resin-supervisor.service: Failed with result ‘timeout’.
Jul 06 04:51:22 b0ecd1d systemd[1]: Failed to start Balena supervisor.
Any ideas why it might happen? Can you please help me to resolve this issue?
Hey Rahul, I’ve tried both scenarios - shutdown from the dashboard and simply pulling off the power supply. Both resulting in the same scenario described above.
You can manually interact with the supervisor from the HostOS service with the following commands on multicontainer applications on devices running OS > 2.9.0:
thanks for the follow up! So today after another power outage I had the same experience with both Pi0 W (balenaOS 2.48.0+rev1, supervisor 10.8.0, multicontainer app balena-sense installed).
I’ve followed your advice and it helped! Thanks a lot! After executing suggested commands both devices re-downloaded docker images, re-installed services and went live with all services working properly.
I have another device, Pi4 with Host OS 2.51.1+rev1 and it has restarted normally after power loss.
I never observed Pi0 W successful restart after power loss and such weird behavior is pretty consistent (I did re-flash my Pi Zeros W about 5 times).
After fixing the issue using your commands I’ve rebooted one properly working Pi0 using dashboard and again same bad behavior. Here is an output:
07.07.20 21:43:31 (-0700) Killing service 'influxdb sha256:744ee41fb7b015a4bf6c466217303c61a24f07c16a280c24fa096fae898ceb7a'
07.07.20 21:43:40 (-0700) Killed service 'influxdb sha256:744ee41fb7b015a4bf6c466217303c61a24f07c16a280c24fa096fae898ceb7a'
07.07.20 21:43:40 (-0700) Service exited 'influxdb sha256:744ee41fb7b015a4bf6c466217303c61a24f07c16a280c24fa096fae898ceb7a'
07.07.20 21:43:41 (-0700) Killing service 'sensor sha256:625dc78417a7843010310dc855ce3cd9731d23cf6f1dc2ae9e8e03e27ab77a56'
07.07.20 21:44:00 (-0700) Killed service 'sensor sha256:625dc78417a7843010310dc855ce3cd9731d23cf6f1dc2ae9e8e03e27ab77a56'
07.07.20 21:44:00 (-0700) Service exited 'sensor sha256:625dc78417a7843010310dc855ce3cd9731d23cf6f1dc2ae9e8e03e27ab77a56'
07.07.20 21:44:00 (-0700) Killing service 'grafana sha256:58bf40673b4b91be0d9b0fc254816a0a5162ea4619b6b561baf5110397a20fd1'
07.07.20 21:44:08 (-0700) Killed service 'grafana sha256:58bf40673b4b91be0d9b0fc254816a0a5162ea4619b6b561baf5110397a20fd1'
07.07.20 21:44:08 (-0700) Service exited 'grafana sha256:58bf40673b4b91be0d9b0fc254816a0a5162ea4619b6b561baf5110397a20fd1'
07.07.20 21:44:08 (-0700) Killing service 'mqtt sha256:a2b55301913b48c01c2420a59fdb3cc0eb6252edda9441c9421dd478f31eb8ea'
07.07.20 21:44:15 (-0700) Killed service 'mqtt sha256:a2b55301913b48c01c2420a59fdb3cc0eb6252edda9441c9421dd478f31eb8ea'
07.07.20 21:44:15 (-0700) Service exited 'mqtt sha256:a2b55301913b48c01c2420a59fdb3cc0eb6252edda9441c9421dd478f31eb8ea'
07.07.20 21:44:16 (-0700) Killing service 'telegraf sha256:876dc7af6bbafe57966938252d2ebb15e9b7692e243cff248f9052c5920203fc'
07.07.20 21:44:27 (-0700) Killed service 'telegraf sha256:876dc7af6bbafe57966938252d2ebb15e9b7692e243cff248f9052c5920203fc'
07.07.20 21:44:28 (-0700) Rebooting
07.07.20 21:44:28 (-0700) Service exited 'telegraf sha256:876dc7af6bbafe57966938252d2ebb15e9b7692e243cff248f9052c5920203fc'
07.07.20 21:50:07 (-0700) Supervisor starting
30 mins passed since then and no changes. Will restart supervisor following your working advise.
Any ideas why such behavior might occur? How can I fix it? It’s a bit annoying to restart/reset supervisor manually and redownload all dockers after every power loss / reset / shutdown.
It would help a lot if you could provide the diagnostics output since there’s much more info there. You can access this feature by navigating to the device summary page, and scrolling to the bottom to select “Diagnostics (Experimental)”. There, select “Device diagnostics” and click on “Run diagnostics”. It should take around a couple of minutes. Once finished, please download the file, remove all sensitive information and provide the output :). thanks!
Last line of previously shared output (21:50:07) Supervisor starting is after reboot, others before reboot.
Sure, happy to share diagnostics. Just tired to do it but unfortunatly I can’t attach txt file to the post, and can’t paste it as pre-formatted text as forum post is limited to 32 k characters and diagnostics output generated 1970k characters. How can I share with you diagnostics output or can you please specify what exact parts of diagnostics you want to look at?
I’m also running a Pi Zero W with the Balena Sense project and experiencing exactly the same problem. If I deploy the new image everything works fine until the next restart whereupon the supervisor hangs on starting.
Can you please post the supervisor logs from the device? The best way to do this would be to supply the device diagnostics file from the diagnostics tab on the dashboard. If your application logs contain anything sensitive it’s better send a personal message to a team member with a link to this thread and we can attach it to the ticket.
Sorry for the delay. Unfortunately it won’t let me attach the log as you can’t attach txt files (file type not allowed) and I can’t paste the log as it breaks your character link. Any other options?
One thing you can try is upgrading the OS on your device to the latest version in production (2.54.2+rev1 as I write this). This version includes support for ZRAM, which may help performance on your device. Can you give this a try and let us know how it works for you?
Thanks for your reply. I updated to 2.54.2+rev1 and the problem looks to be solved. Assume the ZRAM improvements have made the difference in start up times. Thanks for your help.