Issues with offline devices

Hi,
we still have issues from time to time with devices that are going offline but at the site where it is deployed, everything is ok internet-connection-wise.
Our assumption: the telecom modems reboot or will be reconnected from time to time by the telco operator which causes our devices (RPi3) to go “offline”. We would assume that the device will retry to go online if it detects that is is offline (auto reboot or so).
Is there any setting / mechanism we can use for this?
Right now we need to contact our customers where the device is running and they have to do a power cycle and then the device goes up again.
Is anyone else also experiencing this issue?
Thanks
Fritz

One thing to clarify: device is shown as offline in the balena cloud dashboard, so the VPN connection it should establish is not made.
Thanks

Hi @fritz ,

Do you have any other information or insight into what the cell modem is doing when everything gets disconnected? Immediately I am trying to think if there is something happening with the modem, where it requires a new DHCP lease or some other mechanism to enable internet access after a disconnect?

NetworkManager is used under the hood inside balenaOS, and will try to reconnect to any available network connection when it can. If you are able to provide any logs from the device it would be very helpful here. You can also enable persistent logging in the Device Configuration tab so that they will persist across reboots.

Hi @nucleardreamer ,
unfortunately, there are no logs (yet).
Our application tries to detect, when it’s offline and keeps reconnecting to our servers.
If this fails for a configurable amount of times, the application requests a device reboot from the supervisor
image

Most of the time it works because the device might get a new IP, etc from the DHCP, but sometimes it doesn’t. It’s also not stable reproducable. What I can tell is that the device then shows as “offline” (not even heartbeat) in the balenaCloud dashboard. Unplugging + Replugging of the device immediately fixes the issue. That’s why I’m sure that is has to do something with the underlying OS / Supervisor not being able to boot.

Is there any logs I could obtain from the supervisor / OS that might be of help to you?

Thanks

Hi,
attaching a video from an offline device. As you can see, the LAN port does not show any LEDs on, no link, no activity.
(Don’t be distracted from the flashing green/red LEDs inside the case, that’s one of our custom components. Our friendly customer that took this video on-site was not fully aware on what to put on the video)).
After power cycle, the device get’s online (LED port getting active again at 1:36 in the video).

Let me know what else I could provide for you to analyze further.
Thanks

Hi @fritz, are there multiple devices in the same location? If so, do you know if they go offline at the same time or separately? I’m wondering if you can use one to SSH into another and check the logs when one goes down

Hi @danthegoodman1 ,
no, we usually have 1 device / location.
Any other useful information I could extract from the devices? Any log setting I might update to have persisted logs from the supervisor / OS?
Thanks

@fritz So if I understand correctly, the device is now online? If so running a device health check and diagnostic check could show some issues. That is found in the dashboard on the left under the specific device. You will see a lot of output, so just look for anything related to networking.

You could also look to see if it an issue with the balena engine systemctl status balena-engine and journalctl -u balena-engine (do this from host os), the openvpn service journalctl -u openvpn, or network manager journalctl -u networkmanager

Maybe something in dmesg as well if it is a boot issue.

Let me know how that goes

Hi,
nothing found during diagnostics:

Also the journalctl is “empty” at least from the last reboot.
I’ve enabled persistent logging now for this device an “hope” that it will occur to this device again, so we can analyze…

Thanks

Alright, let us know what the logs say the next time it happens!