We have been testing Balena on a fleet of approximately 20 Raspberry Pi 3B+ devices.
We have found that when the fleet is rebooted from the Balena Cloud console, that occasionally some devices fail to reboot. It is not always the same devices, and if it does occur then the devices that fail to restart exhibit the following symptoms:
- The network lights on the Pi are off (i.e. it appears to have fully shut down).
- A red light on the Pi itself is on (PWR), no green light.
- No HDMI output whatsoever (Monitor starts scanning through other possible inputs)
After removing and re-applying power, the Pi boots normally.
Any suggestions as to what might cause this, and how to solve?
Power supplies are the official Raspberry Pi ones (white with a raspberry on them) from RS Components. SD Cards are all Transcend 32GB microSDHC TS32GUSDU1. Device operates via Ethernet. There are no other peripherals aside from a FHD HDMI screen (it is a digital signage application, all changes are pushed via ethernet).
Edit: After some more testing, we found that it was also possible to occasionally “hang” the Pi’s if using the balenaCloud “Restart” option. In these cases, the guest O/S looks like it partially shut down, but then became unresponsive. Both Restart and Reboot then fail to work from the balenaCloud console, the Guest O/S cannot be connected to via balenaCloud Terminal, however the Host O/S can be reached via the balenaCloud Terminal. A “reboot” command issued in the Host O/S Terminal sends the device offline according to balenaCloud, and it fails to restart. In this case, the network lights on the Pi remained active (flashing). Hard power cycle and the Pi comes back online normally.
How do people deal with these issues in real life production environments? Both of these render the system unusable for any scenarios such as digital signage (our application) where the device can’t be power cycled by a user.