Supervisor failing when ipv6 address is unreachable

Hi,

We had a device that was stuck in VPN only mode on the dashboard
Running a health check showed that check_network failed with:
Some networking issues detected: test_ipv6_stack: Could not contact https://ipv6.google.com

I did notice that from inside the Host OS when trying to ping ipv6 the request hung:

ping 2001:4860:4860::8888
PING 2001:4860:4860::8888 (2001:4860:4860::8888): 56 data bytes
^C
--- 2001:4860:4860::8888 ping statistics ---
8 packets transmitted, 0 packets received, 100% packet loss

But ipv4 was fine.

I also found in journalctl:

Feb 02 23:34:55 8dd870e f724159530bf[1334]: [info]    Healthcheck failure - At least ONE of the following conditions must be true:
Feb 02 23:34:55 8dd870e balena-supervisor[52464]: [info]    Healthcheck failure - At least ONE of the following conditions must be true:
Feb 02 23:34:55 8dd870e f724159530bf[1334]: [info]              - No connectivityCheckEnabled   ? false
Feb 02 23:34:55 8dd870e f724159530bf[1334]: [info]            - device state is disconnected  ? false
Feb 02 23:34:55 8dd870e f724159530bf[1334]: [info]            - stateReportErrors less then 3 ? false
Feb 02 23:34:55 8dd870e f724159530bf[1334]: [error]   Healthcheck failed
Feb 02 23:34:55 8dd870e balena-supervisor[52464]: [info]              - No connectivityCheckEnabled   ? false
Feb 02 23:34:55 8dd870e balena-supervisor[52464]: [info]            - device state is disconnected  ? false
Feb 02 23:34:55 8dd870e balena-supervisor[52464]: [info]            - stateReportErrors less then 3 ? false
Feb 02 23:34:55 8dd870e balena-supervisor[52464]: [error]   Healthcheck failed
Feb 02 23:34:55 8dd870e balena-supervisor[52464]: [api]     GET /v1/healthy 500 - 22.440 ms
Feb 02 23:34:55 8dd870e f724159530bf[1334]: [api]     GET /v1/healthy 500 - 22.440 ms

By disabling ipv6 mode for ethernet and restarting both the NetworkManager and supervisor the issue was resolved, no longer stuck in VPN only mode

It seems strange to be that this can cause the supervisor to fail completely. I would have imagined that if ipv6 addresses could not be resolved the supervisor would default to ipv4.
Can someone explain to me why this would happen?
Currently running balenaOS 2.80.3+rev1
Supervisor 12.10.10

Thanks,
Sophia

Hi @sophiahaoui,

Apologies for the delay. I see you’re running Supervisor v12.10.10. v12.11.14 contains a fix for this issue. This GitHub issue contains a good explanation of why the Supervisor behaved as it did, prior to 12.11.14: Supervisor should implement "happy eyeballs" · Issue #1787 · balena-os/balena-supervisor · GitHub

Please let us know if upgrading to 12.11.14 works for you.

Thanks,
Christina

Hi @cywang117 and @klutchell,

unfortunatly, I’m facing the same situation as well. The device is a Compulab IOT-gate-iMX8 with two LAN, one wifi and one mobile network adapter. I’m running balenaOS 2.98.33 and the error was the same with supervisor version Supervisor version 13.1.11 and 14.4.8.

The core issue is, that I’m not able to push changes to the device locally:

Retrying "Supervisor API (GET http://192.168.0.130:48484/ping)" after 2.0s (1 of 5) due to: Error: connect ECONNREFUSED 192.168.0.130:48484

Looking at the lsof output, I can see that port 48484 is only available on IPV6:

root@c5d1c5a:~# lsof -iTCP -sTCP:LISTEN
COMMAND  PID   USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
systemd    1   root   62u  IPv6  21939      0t0  TCP *:22222 (LISTEN)
dnsmasq 1398 nobody    5u  IPv4  23042      0t0  TCP c5d1c5a:domain (LISTEN)
dnsmasq 1398 nobody    7u  IPv4  23044      0t0  TCP c5d1c5a:domain (LISTEN)
balenad 1426   root   10u  IPv6  28113      0t0  TCP *:2375 (LISTEN)
node    1964   root   21u  IPv6  29709      0t0  TCP *:48484 (LISTEN)

After disabling IPV6 for eth0, following this guide, the network settings look like this:

root@c5d1c5a:~# ifconfig -a
balena0   Link encap:Ethernet  HWaddr 02:42:BF:C3:DB:38  
          inet addr:10.114.101.1  Bcast:10.114.101.255  Mask:255.255.255.0
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

br-3fe834a4f816 Link encap:Ethernet  HWaddr 02:42:B3:B1:51:01  
          inet addr:172.17.0.1  Bcast:172.17.255.255  Mask:255.255.0.0
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

can0      Link encap:UNSPEC  HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00  
          NOARP  MTU:16  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:10 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
          Interrupt:216 

eth0      Link encap:Ethernet  HWaddr 00:01:C0:31:DB:F2  
          inet addr:192.168.0.130  Bcast:192.168.0.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:824069 errors:0 dropped:0 overruns:0 frame:0
          TX packets:164288 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:190009976 (181.2 MiB)  TX bytes:22892610 (21.8 MiB)

eth1      Link encap:Ethernet  HWaddr 00:01:C0:32:B2:2A  
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:34968 errors:0 dropped:0 overruns:0 frame:0
          TX packets:34968 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:5884438 (5.6 MiB)  TX bytes:5884438 (5.6 MiB)

resin-dns Link encap:Ethernet  HWaddr 62:B5:0C:23:9C:46  
          inet addr:10.114.102.1  Bcast:0.0.0.0  Mask:255.255.255.0
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

resin-vpn Link encap:UNSPEC  HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00  
          inet addr:10.241.52.37  P-t-P:52.4.252.97  Mask:255.255.255.255
          inet6 addr: fe80::d214:c8be:9b27:bd5c/64 Scope:Link
          UP POINTOPOINT RUNNING NOARP MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:72 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:100 
          RX bytes:0 (0.0 B)  TX bytes:3456 (3.3 KiB)

supervisor0 Link encap:Ethernet  HWaddr 02:42:F4:0C:11:75  
          inet addr:10.114.104.1  Bcast:10.114.104.127  Mask:255.255.255.128
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

wlan0     Link encap:Ethernet  HWaddr 3E:78:55:9B:21:B6  
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

wwan0     Link encap:UNSPEC  HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00  
          POINTOPOINT NOARP MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

resin-vpn seems to still use an IPV6 connection, but this is nothing I can change via nmcli. In the Balena dashboard is is showing the local IP address 192.168.0.130 only. So this looks also fine for me.

The “OS variant” is development and I’m also able to connect to the device locally via balena ssh c5d1c5a.local.

iptables output looks like this:

root@c5d1c5a:~# iptables -L BALENA-FIREWALL -v
Chain BALENA-FIREWALL (1 references)
 pkts bytes target     prot opt in     out     source               destination         
 141K   51M ACCEPT     all  --  any    any     anywhere             anywhere             state RELATED,ESTABLISHED
14419  958K ACCEPT     all  --  any    any     anywhere             anywhere             ADDRTYPE match src-type LOCAL
    0     0 ACCEPT     tcp  --  resin-vpn any     anywhere             anywhere             tcp dpt:48484
    0     0 ACCEPT     tcp  --  tun0   any     anywhere             anywhere             tcp dpt:48484
    0     0 ACCEPT     tcp  --  docker0 any     anywhere             anywhere             tcp dpt:48484
    0     0 ACCEPT     tcp  --  lo     any     anywhere             anywhere             tcp dpt:48484
    0     0 ACCEPT     tcp  --  supervisor0 any     anywhere             anywhere             tcp dpt:48484
    4   256 REJECT     tcp  --  any    any     anywhere             anywhere             tcp dpt:48484 reject-with icmp-port-unreachable
    5   376 ACCEPT     tcp  --  any    any     anywhere             anywhere             tcp dpt:22222
    6   384 ACCEPT     tcp  --  any    any     anywhere             anywhere             tcp dpt:2375
 173K   55M ACCEPT     all  --  any    any     anywhere             anywhere             ADDRTYPE match dst-type MULTICAST
    0     0 ACCEPT     icmp --  any    any     anywhere             anywhere            
    0     0 ACCEPT     udp  --  balena0 any     anywhere             anywhere             udp dpt:domain
56682 4153K RETURN     all  --  any    any     anywhere             anywhere            
    0     0 REJECT     all  --  any    any     anywhere             anywhere             reject-with icmp-port-unreachable

Maybe it is related to this issue? After a second boot iptables output looks like this:

root@c5d1c5a:~# iptables -L BALENA-FIREWALL -v
Chain BALENA-FIREWALL (1 references)
 pkts bytes target     prot opt in     out     source               destination         
    6  1239 ACCEPT     all  --  any    any     anywhere             anywhere             state RELATED,ESTABLISHED
    4   264 ACCEPT     all  --  any    any     anywhere             anywhere             ADDRTYPE match src-type LOCAL
    0     0 ACCEPT     tcp  --  resin-vpn any     anywhere             anywhere             tcp dpt:48484
    0     0 ACCEPT     tcp  --  tun0   any     anywhere             anywhere             tcp dpt:48484
    0     0 ACCEPT     tcp  --  docker0 any     anywhere             anywhere             tcp dpt:48484
    0     0 ACCEPT     tcp  --  lo     any     anywhere             anywhere             tcp dpt:48484
    0     0 ACCEPT     tcp  --  supervisor0 any     anywhere             anywhere             tcp dpt:48484
    0     0 REJECT     tcp  --  any    any     anywhere             anywhere             tcp dpt:48484 reject-with icmp-port-unreachable
    0     0 ACCEPT     tcp  --  any    any     anywhere             anywhere             tcp dpt:22222
    0     0 ACCEPT     tcp  --  any    any     anywhere             anywhere             tcp dpt:2375
    0     0 ACCEPT     all  --  any    any     anywhere             anywhere             ADDRTYPE match dst-type MULTICAST
    0     0 ACCEPT     icmp --  any    any     anywhere             anywhere            
    0     0 ACCEPT     udp  --  balena0 any     anywhere             anywhere             udp dpt:domain
    0     0 RETURN     all  --  any    any     anywhere             anywhere            
    0     0 REJECT     all  --  any    any     anywhere             anywhere             reject-with icmp-port-unreachable

It would be great to get the local development mode up and running, since it would speed up the development process. Could you give me any advise, maybe I should debug further?

The situation was also reproducable with the same device being newly flashed. Also enabling and disabling the “local mode” with or without a reboot was not changing this situation.

Thanks for any support.

(Edit: Added supervison and host os versions)

Hi,

I know it’s been a while since I brought up this issue but we have just hit the same problem on a device running Supervisor v12.11.38 (OS v2.95.8)

Is there a way to simply disable ipv6 at the fleet level?

Thanks,
Sophia