I’ve been running into a strange issue using a BalenaFin in local development mode. I seem to be randomly kicked off the device while SSH-ing into containers after livepush and the device disappears from our local network. However, when I look at the dashboard on Balena.io I can see the device is connected, open a shell to the host using the Balena.io dashboard and see all containers are running using
balena stats, and most strangely I can ping the public IP address so it’s certainly connected.
I had concerns that the device may have switched to a different network but the local IP address has not changed on the dashboard so my assumption is that it is still connected to the same SSID.
When I try and reboot the device from the Balena.io dashboard I get the loading screen and then an error
Request error: ESOCKETTIMEDOUT after 90 seconds.
When I refresh the page I see
Online (Heartbeat only) (local mode) with a warning sign near Status.
After another 90 seconds the Status warning disappears from the dashboard and I can ssh into the device using Balena.io dashboard again.
After this sequence of events I can ping the device locally using the local IP and it will show up. Has anyone else experienced this?
Below: Ping requests from Local IP and Public IP
$ ping 192.168.0.16
PING 192.168.0.16 (192.168.0.16): 56 data bytes
Request timeout for icmp_seq 0
Request timeout for icmp_seq 1
Request timeout for icmp_seq 2
Request timeout for icmp_seq 3
ping: sendto: No route to host
Request timeout for icmp_seq 4
ping: sendto: Host is down
Request timeout for icmp_seq 5
ping: sendto: Host is down
Request timeout for icmp_seq 6
--- 192.168.0.16 ping statistics ---
8 packets transmitted, 0 packets received, 100.0% packet loss
$ PING 184.108.40.206
PING 220.127.116.11 (18.104.22.168): 56 data bytes
64 bytes from 22.214.171.124: icmp_seq=0 ttl=64 time=5.242 ms
64 bytes from 126.96.36.199: icmp_seq=1 ttl=64 time=6.381 ms
64 bytes from 188.8.131.52: icmp_seq=2 ttl=64 time=6.509 ms
64 bytes from 184.108.40.206: icmp_seq=3 ttl=64 time=4.627 ms
64 bytes from 220.127.116.11: icmp_seq=4 ttl=64 time=5.992 ms
64 bytes from 18.104.22.168: icmp_seq=5 ttl=64 time=6.468 ms
64 bytes from 22.214.171.124: icmp_seq=6 ttl=64 time=5.385 ms
64 bytes from 126.96.36.199: icmp_seq=7 ttl=64 time=5.865 ms
--- 188.8.131.52 ping statistics ---
8 packets transmitted, 8 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 4.627/5.809/6.509/0.631 ms
Hey Alexander ,
I tried to replicate this by:
- putting my fin into local mode
- pushing a fleet to the device
- SSHing into various service containers
- changing something in the fleet code, which caused a livepush update
- SSHing into the services
Everything worked for me as I expected.
This does sound like something happening to the networking configuration of the device. Can you let me know if there are more than one configured network in the
system-connections file? This may help: Network Setup on balenaOS 2.x - Balena Documentation
Hi @phil-d-wilson thanks for the response. We do have several connections in our network manager connections folder, with auto connect and priority on each of them for when we move locations. We also have cellular enabled as well with a toggle that we can switch between so we are actively managing that connection as well.
We’ve followed the network setup when implementing most of this. If this were an issue would there be something we could watch for? My only consideration is that we may be switching networks but again the fact that the balena.io cloud says the device has the same local IP makes me think it probably hasn’t switched local networks.
Curious what you think the problems could be and how we can identify them?
Hey - I’ve tried really hard to replicate this, but can’t. I even just spun up another network here, and configured my Fin with both, but it won’t switch to the second one unless the first is completely missing. It certainly won’t switch when I livepush anything.
I’m not sure what to suggest next. If you can provide me with some repro steps, where it happens every time, I can retry. I won’t be able to replicate the cellular connection, though.
This reminds me a little of an issue I had a long time ago with multiple network connections.
What happened there was that the device would get stuck in trying to use an incorrectly configured network interface.
For me what seemed to work back then was to change the
autoconnect-retries to a small number, so it would try the other connections sooner.
Something else that could be happening here, is that your container exits with an error and/or causes reboots (and thereby being unreachable), resulting in the supervisor restoring the previous image (which then becomes reachable).
If you have persistent logging enabled, you should be able to see something about this in the Host OS journal.
@phil-d-wilson thanks for looking into this. Is there a command I can use from Balena Host (which I can access through Balena.io Dashboard) to find out what the current connection is? The system is obviously connected so if we can determine what it’s connected to we can see if it’s somehow using another connection profile or even cellular. Is there an easy command for that?
@TJvV thanks for both these insights. I’ll take a look at both of these potential issues.
Hi, I think you might be looking for the following commands.
- Show all current network interfaces:
- Show current routing table:
- Show NetworkManager connections:
- Show parameters for specific NetworkManager connection (use the “Name” from previous command):
nmcli connection show "Wired connection 1"
- Show NetworkManager devices:
- Show parameters for specific NetworkManager device (use “Device” from previous command):
nmcli device show eth0
- Show ModemManager cellular devices:
- Show parameters for specific cellular device (use index from previous command if any are found):
mmcli --modem 0
- Show logging from NetworkManager and ModemManager:
journalctl --unit NetworkManager --unit ModemManager --no-pager
Hi @phil-d-wilson I’ve found a consistant way to repro this issue and I’m curious if it’s local mode specific. If I livepush to a device an application of any number of services (ie 2 services or 12) and then leave the device on unattended for 24 hours I get the same connection issue every time when I come back into work the next morning. I still have connection on Balena.io dashboard in the am, I can even restart the device from the dashboard, but I cannot ping the local device.
Just wanted to chip in and share my experience. I had encountered an identical issue but with my (Ubuntu) laptop. The laptop used to become unreachable from the network but from the laptop the n/w as well as Internet was reachable. It turned out to be an issue with the router. I replaced the router (for an entirely different reason!) and this issue went away. I would suggest trying with a different router to rule out such an issue.
Thanks and regards,