Seldom we experience a DNS failure in some devices Raspberry Pi3 (containers and Raspberry Pi Zero W
I’ve read some topics on this forum but, although sometimes insightful, couldn’t find one that helps with this.
Both containers and the Host cannot ping domains but can ping IP’s
This leads to lose track of time, I guess because it cannot resolve NTP server names
NetworkManager and Dnsmasq are both running in the Host and I tried restarting them but it didn’t solve.
If I reboot system, it can be solved but I was looking for a smoother recovery and reboot only as a last resort.
Thank you and best regards
Hi, when the problem occurs can you ping the default gateway and DNS server by IP address?
Hello @alexgg ,
Thanks for replying.
As soon as I catch a similar situation, I’ll make the test you’re suggesting!
I’m facing a similar situation: ping IP addresses but not domains
It somehow lost reference to 126.96.36.199 DNS server in resolv.dnsmasq
A BIT OF CONTEXT
This is a device running raspberrypi zero 2W (64 bit)
BalenaOs in development mode
It is connected via a lte 4G modem and it connects to the balena VPN: I have access, via dashboard, to both HostOS and a container
Also it is connected via ethernet to a router which I disconnected from the internet (gateway: 192.168.2.10)
and despite from being connected to the internet via 4G modem, it seems that it is routing DNS through the router (that is disconnected from internet)
I’m trying to wrap my head around this.
If I understood correctly your suggestion, it pings the gateway: 192.168.2.10
Any ideas ?
As soon as I unplugged the eth from the router (which is not connected to the internet) all is fine again…
My guess is that somehow it uses primarily the DNS routing from the eth (besides the actual active connection).
PS - plugging the eth cable again, it falls back again to be unable to resolve DNS
If you can help understanding this behavior and some way to avoid this.
Thank you very much
If need be I can provide you with more details about the fleet and device
Could it be that this is simply an issue of metrics?
In general ethernet devices get assigned a better routing metric than wireless or gsm.
You can check this with
You can try to force the route to use your 4G by adding a rule to your connection settings.
nmcli connection modify <name> ipv4.routes 188.8.131.52/32 <metric>
Regarding your DNS entries getting replaced all together, you can try to add
ipv4.ignore-auto-dns to your ethernet connection settings; that should prevent DHCP from adding new DNS servers.
Hi @TJvV !
Thank you for the quick reply.
I also thought about that possibility, but we had already defined the route metrics as follows:
nmcli connection show eth0
Cellular (4G): 3
nmcli connection show cellular
To be more concise:
With the ETH cable plugged (links to a router without internet)
nmcli: cellular connection is prioritized due to the route metric
ping domain - NOK, ping IP - OK
Without ETH cable
ping domain - OK, ping IP - OK
Despite the route metric favoring cellular connection over the ethernet connection, it seems that whenever I plug the ethernet cable, it automatically tries to resolve DNS “through it”
Thank you and best regards
The configuration does sound sane.
Can you show your
/etc/resolv.conf in both scenarios?
My guess is that your DHCP adds a server in the
192.168.2.9/24 subnet, which would get picked up with
metric 6 as the cellular only has a
Again, can you also try it with
ipv4.ignore-auto-dns enabled on your ethernet connection?
That should prevent DHCP from adding new servers.
Without ethernet cable
With ethernet cable plugged
nmcli connection modify eth0 ipv4.ignore-auto-dns yes
The behavior stops and, as you suggested, and the resolv files remain unchanged
Now We have to ponder if this is an overall good configuration for our purposes.
Thank you a lot for the suggestion and best regards
glad to hear it’s working
If you don’t want to discard the DHCP DNS altogether, it seems there’s also a setting
ipv4.dns-priority that might help in fixing the order of your nameservers.
Maybe try setting that on both connections, with the cellular having a lower value; note that the default values are 50 for VPN and 100 for others and 0 selects the default.
Based on your routing table, it should then first try the
cdc-wdm0, only going to