Device Offline After Update

We have multiple devices based on Raspberry Pi Computing Module. A recent application upgrade (docker-composed balena push) over LTE broke multiple devices and they went offline. They all came online normally after resetting power. Host OS version is balenaOS 2.51.1+rev1 and Supervisor version is 11.4.10. It happened both for development as well as production hosts.

I would appreciate if someone could help me to investigate the problem. What could cause such behaviour and how can I prevent that?

Thanks in advance
Mehdi

Hi, if you enable persistent logging, we’ll be able to see the logs after the reboot and probably find the reason. Also, you can run device health checks to see if it’s healthy. Please take a look at balena device debugging masterclass to learn more about debugging.

Thank you Karaxuna for your quick reply. Unfortuantely persistet logging was disabled, I’m going to enable them and will report back with logs if the issue happens again.

Hi Mehdi, can you also give a bit more info about the hardware stack. When modems are involved, it can often be that new support or udev rule changes have occurred in modemmanager between the two OS versions and this causes the modem not re-initialise correctly after update. What I would suggest is to set up a .dev variant of the OS with persistent logging enabled on an older OS version and hook up the serial console, then trigger an OS upgrade and watch the kernel. When the device boots up after the upgrade grab the full journal logs with journalctl --no-pager and you can add them to this thread and we help you try figure out what is failing.

Hello Shaunmulligan,
I have SIMCOM7100E mPCIE EU version onboard. And btw, I did not run any OS upgrade, only application upgrade. I am trying to implement SMS Reboot to see whether in case of a connection failure I can reboot the device using text commands.

Hey there

Thanks for confirming the modem you are using.

However, without the logs we are still not any to hep much. Maybe you can try to reproduce the error by pushing another app update on a development device and collected logs as suggested above. Thanks