Device shows "Offline" after reboot but logs are streaming in

We’ve had a couple instances of devices that were working fine, then immediately after reboot (not restart), they show as “Offline” in the balena status page, but still stream logs and our application container appears to be running. They remain “Offline” in the balena page so we can’t restart/access them (“The terminal is disabled because your device is offline”).

We looked at a similar post Intel NUC - Device status is "offline" but logs are streamed and concluded that this wasn’t our issue, since other devices on the same network are working fine, and the failing devices were fully functional immediately before the reboot.

These are remote production devices so it’d be great to reinstate the shell/restart functionality. Any thoughts or debugging tips?

The symptoms you see, point to the device not being connected to the Balena VPN. This could be due to networking problems or internal problems.

You have come to the conclusion that the referenced thread is not connected to your problem. Can you rule out connectivity problems ?

There is a FAQ that explains which ports / hosts are needed for a device to connect to balena-cloud at
https://www.balena.io/docs/faq/troubleshooting/faq/#what-network-ports-are-required-

Are you sure that these hosts and ports are accessible for the affected devices ?

We can try accessing the device using another device in the same network. To do this I would need support access to the affected device as well as to another working device in the same network.

Regards Thomas

Also just to clarify; the log stream and VPN are seperate, so it is OK that you see logs but no VPN.

The VPN connects in on port 443, so you might want to check your infrastructure firewall to make sure nothing is preventing it after the restart (could be the state table in a NAT for example). If that doesn’t work and you have used a development image, you can jump in on SSH and look at the VPN logs:

$ ssh root@{device IP address} -p 22222
$ journalctl -u openvpn -n100 --no-pager

Hey guys, thanks for the quick and thorough response.

It seems the affected devices have spontaneously recovered full VPN functionality. Thus, unfortunately I don’t know any more about what might have caused this, but I can say that the recovery time was >12 hours.

What I will say is that we’ve seen several “in the field” network setups where the “core” functionality works and the VPN doesn’t. That resulted in a confusing debugging experience, especially since we didn’t fully understand the core/VPN distinction, we scrambled to figure out what was going on. Perhaps it could be clarified in the network docs which balena services (logs vs terminal vs updates etc) operate on which ports, and what the VPN does vs what the “core” does. Consider that one data point of user feedback.