BalenaOS DNS failure

suporte · June 29, 2023, 12:59pm

Hi all,

Seldom we experience a DNS failure in some devices Raspberry Pi3 (containers and Raspberry Pi Zero W

I’ve read some topics on this forum but, although sometimes insightful, couldn’t find one that helps with this.

Both containers and the Host cannot ping domains but can ping IP’s

This leads to lose track of time, I guess because it cannot resolve NTP server names

NetworkManager and Dnsmasq are both running in the Host and I tried restarting them but it didn’t solve.

If I reboot system, it can be solved but I was looking for a smoother recovery and reboot only as a last resort.

Any hints?

Thank you and best regards

alexgg · July 5, 2023, 10:36am

Hi, when the problem occurs can you ping the default gateway and DNS server by IP address?

suporte · July 5, 2023, 2:27pm

Hello @alexgg ,

Thanks for replying.
As soon as I catch a similar situation, I’ll make the test you’re suggesting!

suporte · July 6, 2023, 2:56pm

Hi @alexgg

I’m facing a similar situation: ping IP addresses but not domains

Captura de ecrã 2023-07-06, às 16.03.56

It somehow lost reference to 8.8.8.8 DNS server in resolv.dnsmasq

Captura de ecrã 2023-07-06, às 16.18.05

A BIT OF CONTEXT
This is a device running raspberrypi zero 2W (64 bit)
BalenaOs in development mode

It is connected via a lte 4G modem and it connects to the balena VPN: I have access, via dashboard, to both HostOS and a container

Also it is connected via ethernet to a router which I disconnected from the internet (gateway: 192.168.2.10)

and despite from being connected to the internet via 4G modem, it seems that it is routing DNS through the router (that is disconnected from internet)

I’m trying to wrap my head around this.

If I understood correctly your suggestion, it pings the gateway: 192.168.2.10

Any ideas ?

FINAL EDIT:
As soon as I unplugged the eth from the router (which is not connected to the internet) all is fine again…

Captura de ecrã 2023-07-06, às 16.44.18

My guess is that somehow it uses primarily the DNS routing from the eth (besides the actual active connection).

PS - plugging the eth cable again, it falls back again to be unable to resolve DNS

If you can help understanding this behavior and some way to avoid this.

Thank you very much
If need be I can provide you with more details about the fleet and device

Best regards

TJvV · July 7, 2023, 6:31am

Hi,

Could it be that this is simply an issue of metrics?
In general ethernet devices get assigned a better routing metric than wireless or gsm.
You can check this with ip route.
You can try to force the route to use your 4G by adding a rule to your connection settings.
nmcli connection modify <name> ipv4.routes 8.8.8.8/32 <metric>

Regarding your DNS entries getting replaced all together, you can try to add ipv4.ignore-auto-dns to your ethernet connection settings; that should prevent DHCP from adding new DNS servers.

suporte · July 7, 2023, 3:24pm

Hi @TJvV !

Thank you for the quick reply.
I also thought about that possibility, but we had already defined the route metrics as follows:

Ethernet: 6
nmcli connection show eth0

Cellular (4G): 3
nmcli connection show cellular

To be more concise:

With the ETH cable plugged (links to a router without internet)

nmcli: cellular connection is prioritized due to the route metric

ping domain - NOK, ping IP - OK

ip route

nslookup

Captura de ecrã 2023-07-07, às 16.08.50

Without ETH cable

nmcli

ping domain - OK, ping IP - OK

ip route

nslookup

IN SHORT

Despite the route metric favoring cellular connection over the ethernet connection, it seems that whenever I plug the ethernet cable, it automatically tries to resolve DNS “through it”

Thank you and best regards

TJvV · July 11, 2023, 6:11am

Hi,

The configuration does sound sane.

Can you show your /etc/resolv.conf in both scenarios?
My guess is that your DHCP adds a server in the 192.168.2.9/24 subnet, which would get picked up with metric 6 as the cellular only has a /30 subnet.

Again, can you also try it with ipv4.ignore-auto-dns enabled on your ethernet connection?
That should prevent DHCP from adding new servers.

suporte · July 12, 2023, 7:10pm

Hi @TJvV

Without ethernet cable

With ethernet cable plugged

After adding:

nmcli connection modify eth0 ipv4.ignore-auto-dns yes

The behavior stops and, as you suggested, and the resolv files remain unchanged

Now We have to ponder if this is an overall good configuration for our purposes.
Thank you a lot for the suggestion and best regards

TJvV · July 14, 2023, 6:51am

Hi,

glad to hear it’s working

If you don’t want to discard the DHCP DNS altogether, it seems there’s also a setting ipv4.dns-priority that might help in fixing the order of your nameservers.
Maybe try setting that on both connections, with the cellular having a lower value; note that the default values are 50 for VPN and 100 for others and 0 selects the default.

Based on your routing table, it should then first try the 172.30.8.5 and 172.30.8.6 via cdc-wdm0, only going to 192.168.2.10 via eth0

Topic		Replies	Views
DNS failure not caught by supervisor balenaOS	72	2280	November 11, 2020
Pihole in debian container on balenaOS Project help	5	1701	January 7, 2019
Supervisor fails to resolve DNS on v4, v5 in offline/air-gapped setup using open-balena Product support raspberrypi3 , docker	5	232	June 12, 2024
Healthcheck and supervisor container DNS resolution issues openBalena support , network , raspberrypi4	2	830	March 10, 2022
Adding a local DNS host entry to host OS balenaOS	7	2484	August 12, 2020

BalenaOS DNS failure

Related topics