Thank you very much for the journal logs, it’s been extremely useful (I’ve had a look on the device today, and of course all the logs have rotated, so this was vital for us!).
Having looked through, I’ve a couple of comments. The first is that you appear to be running a development device, is this intended? I ask, because dev devices can be SSH accessed without a password. We strongly recommend never running a device in a possibly hostile/non-development environment using a development image, as hostile actors could potentially take control over it. I’ve noticed several accesses, but they appear to be over our VPN which means from the WebTerminal or balena ssh
.
When the device disconnects:
Nov 24 07:36:52 5535031 avahi-daemon[721]: Withdrawing address record for 192.168.0.33 on enp0s20f7.
Nov 24 07:36:52 5535031 avahi-daemon[721]: Leaving mDNS multicast group on interface enp0s20f7.IPv4 with address 192.168.0.33.
Nov 24 07:36:52 5535031 avahi-daemon[721]: Interface enp0s20f7.IPv4 no longer relevant for mDNS.
Nov 24 07:36:52 5535031 NetworkManager[651]: <info> [1574581012.4573] manager: NetworkManager state is now CONNECTED_SITE
Nov 24 07:36:52 5535031 dbus[592]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service'
Nov 24 07:36:52 5535031 systemd[1]: Starting Network Manager Script Dispatcher Service...
Nov 24 07:36:52 5535031 dbus[592]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
Nov 24 07:36:52 5535031 systemd[1]: Started Network Manager Script Dispatcher Service.
Nov 24 07:36:52 5535031 nm-dispatcher[9001]: req:1 'connectivity-change': new request (0 scripts)
Nov 24 07:36:52 5535031 nm-dispatcher[9001]: req:1 'connectivity-change': completed: no scripts
The VPN can then no longer connect to the balena backend:
Nov 24 07:37:57 5535031 openvpn[20330]: Attempting to establish TCP connection with [AF_INET]18.232.192.190:443 [nonblock]
Nov 24 07:37:57 5535031 openvpn[20330]: TCP: connect to [AF_INET]18.232.192.190:443 failed: Network is unreachable
After several failures of this type, the VPN falls back to attempting to resolve the VPN hostname, in case the address has changed:
Nov 24 07:38:13 5535031 openvpn[20330]: RESOLVE: Cannot resolve host address: vpn.balena-cloud.com:443 (Name or service not known)
Nov 24 07:38:13 5535031 openvpn[20330]: RESOLVE: Cannot resolve host address: vpn.balena-cloud.com:443 (Name or service not known)
However, what is interesting is that the network interface is still up and active, because when the local engineer disconnects and reconnects the interface we see:
Nov 24 09:43:19 5535031 kernel: stmmaceth 0000:00:14.7 enp0s20f7: Link is Down
Nov 24 09:43:19 5535031 kernel[604]: stmmaceth 0000:00:14.7 enp0s20f7: Link is Down
Nov 24 09:43:21 5535031 balenad[685]: [376B blob data]
Nov 24 09:43:23 5535031 kernel: stmmaceth 0000:00:14.7 enp0s20f7: Link is Up - 100Mbps/Full - flow control rx/tx
It also appears that this didn’t work, and eventually they switched to the secondary port:
Nov 24 09:43:55 5535031 kernel: stmmaceth 0000:00:14.6 enp0s20f6: Link is Up - 100Mbps/Full - flow control rx/tx
Nov 24 09:43:55 5535031 kernel[604]: stmmaceth 0000:00:14.6 enp0s20f6: Link is Up - 100Mbps/Full - flow control rx/tx
Nov 24 09:43:55 5535031 NetworkManager[651]: <info> [1574588635.4931] device (enp0s20f6): carrier: link connected
Nov 24 09:43:55 5535031 NetworkManager[651]: <info> [1574588635.5213] device (enp0s20f6): state change: unavailable -> disconnected (reason 'carrier-changed', sys-iface-state: 'managed')
...
Nov 24 09:43:56 5535031 avahi-daemon[721]: New relevant interface enp0s20f6.IPv6 for mDNS.
Nov 24 09:43:56 5535031 avahi-daemon[721]: Registering new address record for fe80::2676:4d69:4624:b6b3 on enp0s20f6.*.
Nov 24 09:43:56 5535031 NetworkManager[651]: <info> [1574588636.9501] dhcp4 (enp0s20f6): address 192.168.0.35
Nov 24 09:43:56 5535031 NetworkManager[651]: <info> [1574588636.9555] dhcp4 (enp0s20f6): plen 24 (255.255.255.0)
Nov 24 09:43:56 5535031 NetworkManager[651]: <info> [1574588636.9561] dhcp4 (enp0s20f6): gateway 192.168.0.1
Nov 24 09:43:56 5535031 NetworkManager[651]: <info> [1574588636.9570] dhcp4 (enp0s20f6): lease time 86400
Nov 24 09:43:56 5535031 NetworkManager[651]: <info> [1574588636.9633] dhcp4 (enp0s20f6): nameserver '192.168.0.1'
Nov 24 09:43:56 5535031 NetworkManager[651]: <info> [1574588636.9639] dhcp4 (enp0s20f6): nameserver '8.8.8.8'
Nov 24 09:43:56 5535031 NetworkManager[651]: <info> [1574588636.9645] dhcp4 (enp0s20f6): state changed unknown -> bound
Nov 24 09:43:57 5535031 dhclient[15507]: bound to 192.168.0.35 -- renewal in 35515 seconds.
...
Nov 24 09:43:58 5535031 NetworkManager[651]: <info> [1574588638.1190] manager: NetworkManager state is now CONNECTED_GLOBAL
What’s interesting is when the problem initially occurs, it looks like something similar is happening in that it’s attempting to renew a DHCP lease but doesn’t get the new address. It gets into CONNECTED_SITE
state, but never CONNECTED_GLOBAL
(which is required to ensure it can contact the Internet and hence our VPN backend).
What we think is happening here is that the local DHCP server gives out leases on a 24 or 48 hour basis, and then asks the device to renew the DHCP lease, but it either refuses to renegotiate or for some reason can’t do so with the server.