Access to Host-Level logs for Production devices

I’m wondering if there is a suggested way to get access to journalctl logs that are out in the field.

Our use case is that we have devices in the field that are going down and staying down, which is most likely due to modem issues that are hard to reproduce during testing. I’m looking for some way to be able to do a post-mortem once the device is brought back online.

I will start using persistent logging going forward, but getting value out of that would require retrieving the device from the field and entering local mode. These are not dev images, so local mode isn’t really an option either.

Do you have any suggestions or users who are gaining access to system/host logs (ModemManager/NetworkManager is my interest) for production devices?

Additionally, I’m looking for a foolproof way to reinitialize modems that are in a bad state. I’ve tried using ModemManager’s DBUS interface to reset modems, but it hasn’t worked in all cases.

1 Like

I would be interested in this as well as occasionally we have devices (Raspberry Pi 3) drop off the network and not reconnect until we power cycle the device. It would be great to be able to see if there’s a root cause that can be addressed or to have another service monitor the network and restart networking after N minutes not being able to reach an outside endpoint.

1 Like

Hello,

I’m not aware of a way to easily access host level logs from within the container. I know that you can run dmesg even within the web terminal. Is this something that you could try?

Unfortunately I can’t run dmesg as I’m not able to access the device from within the web terminal at the time it’s offline. To get it online I need to power cycle it, and then it comes up again. I send up the logs from the system with journalctl so I’ll try adding the --dmesg flag and see if there’s anything that comes in just before it drops offline that might indicate what’s going on.

As an intermediate term fix, I’m thinking of setting up a service to try and ping the outside world and then restart the device if it fails too often (something like a check every 10 seconds and restart after 6 failures). I know I can control the restart with a curl command to the supervisor, but is there any way to try just power cycling the network connection without restarting the whole device?

Highly interested in this discussion.
@pguelpa, I don’t know if you have any update on the subject, but this is exactly what I’ve done. That is, my Python application tries to reach a REST API on the internet, then, if connection fails for at least one hour, tries to restart something. That is, it first tries to remove and reload the kernel module needed by the 3G modem, then restarts ModemManager. As a last chance, it also just reboot the board by talking with the Supervisor.