One of my remote Intel NUC’s has gotten stuck in offline mode, and, every 4 to 15 to 60 minutes (varies) it goes into VPN-only mode for a few minutes.
While in VPN-only mode, I’ve tried the Balena Dashboard tools: reboot button, logging in to terminal, stopping the services, all of which fail to execute.
It is looking like a manual reboot is the only way out. In your experience, what types of error could be triggering this cycling between offline mode and VPN-only mode?
Thanks,
Sandy
Hi, welcome to the forums. The VPN-only mode happens when the supervisor running in the device is not able to reach the balenaCloud API. The first thing to check is that the network the device is connected to complies with the Balena network requirements as specified in Network Setup on balenaOS - Balena Documentation.
If the network is compliant, log into the hostOS shell and take a look at the supervisor logs with:
Alex,
Thanks for the speedy reply. I was unable to open hostOS shell during VPN-only mode (never advanced passed “Connecting…”), and finally resorted to a manual reboot. Upon manual reboot of the NUC, I was able to examine the supervisor logs and network port assignments.
Sandy
Supervisor Logs
root@2321700:~# journalctl -u balena-supervisor --no-pager
– Journal begins at Tue 2022-08-09 19:23:48 UTC, ends at Tue 2022-08-09 19:25:45 UTC. –
Aug 09 19:24:02 2321700 balena-supervisor[2981]: [info] Reported current state to the cloud
Aug 09 19:24:12 2321700 balena-supervisor[2981]: [info] Reported current state to the cloud
Aug 09 19:24:33 2321700 balena-supervisor[2981]: [info] Reported current state to the cloud
Aug 09 19:24:44 2321700 balena-supervisor[2981]: [info] Reported current state to the cloud
Aug 09 19:25:05 2321700 balena-supervisor[2981]: [info] Reported current state to the cloud
Aug 09 19:25:16 2321700 balena-supervisor[2981]: [info] Reported current state to the cloud
Hi,
Thanks for sharing an excerpt of the supervisor logs. From the logs, it is apparent that the device is able to reach balena’s API endpoint at least some times. We can see that there are two TCP connections established to port 443. One is by openvpn and the other is by a node daemon to IP address 18.232.229.138 (which is an AWS EC2 IP). This process is most likely the supervisor. Sharing journalctl logs for the supervisor or for all services will help us narrow down the issue effectively.
In my experience, such errors are caused by network appliances between the device and balena’s API servers. Do you happen to have a firewall or a transparent proxy or a Deep Packet Inspection appliance that could be causing this?
I had this happen because my ISP and router were not properly routing IPV6 traffic. Since my application does not need IPV6 at this time, I disabled it in Network Manager.
Pranav,
Thanks for weighing in on this. We do not have a firewall, transparent proxy or deep packet inspection appliance. The NUC unit running BalenaOS usually experiences uninterrupted access to the Balena API endpoint via cell modem.
Today:
balenaCloud Terminal is unable to connect to either hostOS or any container. “Red Dot”
Cannot access journalctl logs due to balenaCloud Terminal unable to connect
balenaCloud Log shows the NUC unit is running its installed program
Attached:
Device Health Check
Device Diagnostics
Screen shot of container status & Terminal session
Host OS: balenaOS 2.98.33
Supervisor: 14.0.13
What else can I do to regain access to my system, and get you the necessary journalctl information?
The NUC unit running BalenaOS usually experiences uninterrupted access to the Balena API endpoint via cell modem.
How is the cell modem configured? The diagnostics you attached show mmcli not detecting any modem. There is an ethernet interface with an assigned IP address.
How is the device connected to the internet, via ethernet or via cellular?
Hi, so I understand that you use an external cellular router and the device connects to it via ethernet. And you are running balenaOS 2.98.33 which is fairly new.
I am wondering whether this is something that has been introduced in recent balenaOS versions. Do you have other devices in your fleet also connecting via the same type of cellular router on older balenaOS releases? If so, do they experience the problem?