Greetings,
i have an issue regarding the OpenBalena setup.
I wish to deploy OpenBalena for a few Raspberry Pi 4 devices on my local network. I have followed the setup guide on the openbalena documentation page, and have successfully set up the following:
- OpenBalena server on a local device (an Intel NUC used as a development server) - openBalena - Home
- Balena-cli on my local machine with working certificates (balena login, balena scan, balena deploy etc, they all work)
- A local DNS server running via dnsmasq on another local NUC, with all the necessary addresses pointing to my openbalena server (api, registry, vpn, s3, tunnel)
- Ran balena os-configure to configure a new downloaded BalenaOS image for the Raspberry Pi 4-s in the fleet
- Configured static IPs for all of the devices, including the BalenaOS devices via the resin-ethernet file in system-connections (includes the dns field with my DNS address)
The working segments of the setup are as follows:
- The devices (when flashed with the preconfigured image), boot normally and connect to the network as it’s specified in the network file.
- The balena scan command shows the devices’ info (i have them running in development mode) and the version of balenaOS (2.88.5+rev1)
- I am able to ssh onto the devices.
This is where the issues start:
The clock on the devices does not sync (left the device running for a day, clock did not sync). I have then successfully forced the time sync with /usr/sbin/chronyd.
After getting the correct clock, the device then manages to get the configuration for the fleet, pulls the docker images, and spins up the containers. Everything seems to be functioning normally docker-wise.
However, upon issuing “balena devices” on the server, both of the devices are reported offline. I can even access their logs, but the healthcheck seems to be failing. When I have ssh’d onto the devices, I can confirm the following:
- The dnsmasq service still uses the google DNS address (8.8.8.8) on top of my local DNS IP. I have checked all of the configuration files used, and changed the only file that contains 8.8.8.8 as the nameserver (/run/dnsmasq.servers) to the IP of my DNS. The file gets overwritten by the google DNS every time dnsmasq is restarted.
- The supervisor container periodically reports “Event: Device state report failure {“error”:“getaddrinfo ENOTFOUND api.mydomain”}”. I have tried to curl -k api.mydomain/ping address from inside the balena_supervisor container, and the address resolution fails. However, if i run the same command on the device itself, the address is resolved and i get OK as the response.
- If I manually set the “dns” value inside a created /etc/docker/daemon.json that points to my address, the balena docker engine fails to start.
I suspect that the DNS resolution for the balena/docker engine is the main culprit, and prevents the devices to be shown as online in the openbalena server side.
Any and all insight on the topic is most welcome, as I have been trying to debug the situation for days, to no avail. Thank you!