Supervisor fails to resolve DNS on v4, v5 in offline/air-gapped setup using open-balena

We deploy open-balena to an air-gapped network where the router resolves all the required balena domains: e.g. api.aivero.lan and advertises that DNS server via DHCP.

We balena os configure RaspberryPi3 with balenaOS v2.80.3 and these connect nicely.

We also tried adding a dnsServers: "null" entry to config.json to disable the automatic injection of 8.8.8.8 into the list of DNS servers. In certain cases having 8.8.8.8 caused a timeout waiting on a response from this server which is not reachable due to our air-gapped network.

However, these old images don’t have the fixed/updated HQ camera sensor-mode 5 so we need a newer version.

However, the newest v5.0.8, or v2.115.18+rev2 versions do not connect to open balena. The supervisors errors with getaddrinfo EAI_AGAIN api.aivero.lan:

EDIT: The latest openBalena version for RaspberryPi3 that has the HQ camera fix AND connects correctly is the v2.94.4
For the RaspberryPi4 we are using v2.88.4+rev0 which has both the HQ fix AND connects correctly.

root@9dc1123:~# balena ps
CONTAINER ID   IMAGE                                                            COMMAND                  CREATED          STATUS                             PORTS     NAMES
c699ff174f56   registry2.balena-cloud.com/v2/c5636e5430e2762232e60e19e79c773f   "/usr/src/app/entry.…"   49 seconds ago   Up 41 seconds (health: starting)             balena_supervisor
root@9dc1123:~# balena logs c699ff174f56 -f
INFO: Found device /dev/mmcblk0p1 on current boot device mmcblk0, using as mount for '(resin|balena)-boot'.
INFO: Found device /dev/mmcblk0p5 on current boot device mmcblk0, using as mount for '(resin|balena)-state'.
INFO: Found device /dev/mmcblk0p6 on current boot device mmcblk0, using as mount for '(resin|balena)-data'.
find: /mnt/root/tmp/balena-supervisor/services: No such file or directory
[info]    Supervisor v15.0.4 starting up...
[info]    Setting host to discoverable
[debug]   Starting systemd unit: avahi-daemon.service
[debug]   Starting systemd unit: avahi-daemon.socket
[debug]   Starting logging infrastructure
[info]    Starting firewall
[warn]    Invalid firewall mode: . Reverting to state: off
[info]    Applying firewall mode: off
[success] Firewall mode applied
[debug]   Starting api binder
[debug]   Performing database cleanup for container log timestamps
[info]    Previous engine snapshot was not stored. Skipping cleanup.
[debug]   Handling of local mode switch is completed
(node:1) [DEP0005] DeprecationWarning: Buffer() is deprecated due to security and usability issues. Please use the Buffer.alloc(), Buffer.allocUnsafe(), or Buffer.from() methods instead.
(Use `node --trace-deprecation ...` to show where the warning was created)
[info]    API Binder bound to: https://api.aivero.lan/v6/
[event]   Event: Supervisor start {}
[info]    Starting API server
[info]    Supervisor API successfully started on port 48484
[debug]   Ensuring device is provisioned
[debug]   Connectivity check enabled: true
[debug]   Starting periodic check for IP addresses
[event]   Event: Device bootstrap {}
[info]    Waiting for connectivity...
[info]    VPN connection is not active.
[info]    New device detected. Provisioning...
[success] Initialised splash image backend
[info]    Reporting initial state, supervisor version and API info
[info]    Attempting to load any preloaded applications
[error]   LogBackend: unexpected error: Error: getaddrinfo EAI_AGAIN api.aivero.lan
[error]         at GetAddrInfoReqWrap.onlookupall [as oncomplete] (node:dns:119:26)
[event]   Event: Device bootstrap failed, retrying {"delay":30000,"error":{"cause":{},"isOperational":true,"errno":-3001,"code":"EAI_AGAIN","syscall":"getaddrinfo","hostname":"api.aivero.lan"}}
^C
root@9dc1123:~# ^C
root@9dc1123:~# ping api.aivero.lan
PING api.aivero.lan (192.168.88.243): 56 data bytes
64 bytes from 192.168.88.243: seq=0 ttl=64 time=1.528 ms
64 bytes from 192.168.88.243: seq=1 ttl=64 time=1.777 ms
^C

In the hostOS we can nslookup, ping or curl api.aivero.lan just fine.

Inside the supervisor container nslookup resolves it to the correct IP, but shows it as a Non-Authoritative answer.


@acostach any insights here? Thank you :slight_smile:

We have a temporary workaround:

The latest openBalena version for RaspberryPi3 that has the HQ camera fix AND connects correctly is the v2.94.4

For the RaspberryPi4 version v2.88.4+rev0 has both the HQ fix AND connects correctly.


The question is how we can get the new 5.x.x versions fixed such that RPI3 and RPI4 connect correctly.

Any news on a potential real fix or any guidance on how to debug this further?