I have strange behaviors trying to update or install my Fleet on my Raspberry PI 4 rev1.5 using balenaCloud.
For example i have:
One device where BalenaOS 2.95.8 was installed before the last 2 weeks.
One device with BalenaOS 2.95.8 installed today
One device with BalenaOS 2.98.33 installed today
The first devices installed my newest fleet version without any problem, but the last 2 are 3 hours in the progress of downloading docker images and, sometimes fails with strange errors like
Failed to download image 'registry2.balena-cloud.com/v2/4386bd3d2c58b4b54155b5f67e9b61ec@sha256:91c88e3f5551f57675e750e9eb49f36911a6011eeebd2e23bb65629529367ff9'
due to 'connect ECONNREFUSED /var/run/balena-engine.sock
Or
Failed to download image 'registry2.balena-cloud.com/v2/7055a43d63d6a8018d640ee4ea51cd37@sha256:4e2ed557ebed9d7d4fb0999efa108002c1766130d452423ea227cc21d1ee4750'
due to '(HTTP code 500) server error - Get "https://registry2.balena-cloud.com/v2/": net/http: TLS handshake timeout
I’ve also tested with Development & PRoduction editions of BalenaOS it still act the same evn on latest BalenaOS version (2.107.1+rev1 at the time of writing).
To complete test i’ve tried another internet access without any success.
I’d be happy to give you more informations about this issue.
Hi there, thanks for reaching us. Could you share the engine logs from before you get the connect ECONNREFUSED /var/run/balena-engine.sock?
You can use the time when the error happens in the logs to isolate the results, for instance
journalctl -a --until 'YYYY-MM-DD hh:mm:ss' --no-pager
Could you also share the results of what happens when downloading one of our public images manually? Do you still get TLS handshake timeouts?
# Pull a supervisor image
balena pull registry2.balena-cloud.com/v2/0ff5f211eb0787cde6700b220dbabe88
Could you also share some more details about your connection setup? TLS handshake timeouts usually happen due to connectivity issues. Have you noticed other issues with connectivity?
I’ve tried downloading the image but it seems that the Engine is not running.
Using default tag: latest
root@c2f9720:~# balena pull registry2.balena-cloud.com/v2/0ff5f211eb0787cde6700b220dbabe88
Cannot connect to the balenaEngine daemon at unix:///var/run/balena-engine.sock. Is the balenaEngine daemon running?
To clarify things, not only the image downloading is really slow, but accessing the devices in balenaCloud dashboard is an hassle too. It take maybe 10 to 15 minutes for new devices to show on my dashboard.
For my connection setup i use a Usb to Ethernet adapter as my connection to internet, the internal Ethernet port is used for local networking communications with other devices.
I do have system-connections files in the /boot directory of the SD Card which are the following:
After some investigation I just found out that Balena Services are not in good shapes with devices where i have trouble with.
Here is the systemctl result on a faulty device.
root@c2f9720:~# systemctl | grep balena
sys-devices-virtual-net-balena0.device loaded active plugged /sys/devices/virtual/net/balena0
sys-subsystem-net-devices-balena0.device loaded active plugged /sys/subsystem/net/devices/balena0
etc-balena\x2dsupervisor.mount loaded active mounted /etc/balena-supervisor
usr-share-ca\x2dcertificates-balena.mount loaded active mounted /usr/share/ca-certificates/balena
balena-hostname-conf.path loaded active waiting balena-hostname path watch
balena-net-config-conf.path loaded active waiting balena-net-config path watch
balena-ntp-config-conf.path loaded active waiting balena-ntp-config path watch
balena-supervisor-conf.path loaded active waiting balena-supervisor path watch
extract-balena-ca-conf.path loaded active waiting extract-balena-ca path watch
balena-device-uuid.service loaded active exited Balena device UUID
balena-hostname.service loaded active exited Balena Hostname Configuration
balena-info@tty1.service loaded active exited Balena info on tty1
balena-net-config.service loaded active exited Resin network configure service
balena-ntp-config.service loaded active exited Resin NTP server configure service
balena-persistent-logs.service loaded active exited Balena persistent logs
balena-proxy-config.service loaded active exited Resin proxy configuration service
â—Ź balena-supervisor-conf.service loaded failed failed balena-supervisor.json watcher service
balena-supervisor.service loaded inactive dead start Balena supervisor
balena.service loaded activating start start Balena Application Container Engine
bind-etc-balena-supervisor.service loaded active exited Bind mount for /etc/balena-supervisor
bind-usr-share-ca-certificates-balena.service loaded active exited Bind mount for /usr/share/ca-certificates/balena
system-balena\x2dinfo.slice loaded active active Slice /system/balena-info
balena-engine.socket loaded active listening Docker Socket for the API
balena-host.socket loaded active listening Docker Socket for the API
update-balena-supervisor.timer loaded active waiting Balena supervisor updater timer
And Here is on a good working device
root@c74f18f:~# systemctl | grep balena
sys-devices-virtual-net-balena0.device loaded active plugged /sys/devices/virtual/net/balena0
sys-subsystem-net-devices-balena0.device loaded active plugged /sys/subsystem/net/devices/balena0
etc-balena\x2dsupervisor.mount loaded active mounted /etc/balena-supervisor
usr-share-ca\x2dcertificates-balena.mount loaded active mounted /usr/share/ca-certificates/balena
balena-hostname-conf.path loaded active waiting balena-hostname path watch
balena-net-config-conf.path loaded active waiting balena-net-config path watch
balena-ntp-config-conf.path loaded active waiting balena-ntp-config path watch
balena-supervisor-conf.path loaded active waiting balena-supervisor path watch
extract-balena-ca-conf.path loaded active waiting extract-balena-ca path watch
balena-device-uuid.service loaded active exited Balena device UUID
balena-hostname.service loaded active exited Balena Hostname Configuration
balena-info@tty1.service loaded active exited Balena info on tty1
balena-net-config.service loaded active exited Resin network configure service
balena-ntp-config.service loaded active exited Resin NTP server configure service
balena-persistent-logs.service loaded active exited Balena persistent logs
balena-proxy-config.service loaded active exited Resin proxy configuration service
balena-supervisor.service loaded active running Balena supervisor
balena.service loaded active running Balena Application Container Engine
bind-etc-balena-supervisor.service loaded active exited Bind mount for /etc/balena-supervisor
bind-usr-share-ca-certificates-balena.service loaded active exited Bind mount for /usr/share/ca-certificates/balena
system-balena\x2dinfo.slice loaded active active Slice /system/balena-info
balena-engine.socket loaded active running Docker Socket for the API
balena-host.socket loaded active listening Docker Socket for the API
update-balena-supervisor.timer loaded active waiting Balena supervisor updater timer
On the faulty one the Balena.service, which i assume is the balenaEngine, seems to be dead.
Maybe it gives you some hints about the situation.
Hello, are you still experiencing this issue? If so, can you please run balena ps in a shell on the device? If balena engine is not running then it should fail. You can also check the status by running systemctl status balena. Next question is which supervisor version is your device running? If it is running a supervisor version < 14.0.0, then the following may be a solution for the trouble with the engine:
run journalctl -xe to see if the socket is already in use. If it is, it will look something like this:
root@40b5a3e:~# journalctl -xe
Feb 07 16:52:08 40b5a3e systemd[1]: balena-supervisor.service: Failed with result 'exit-code'.
Feb 07 16:52:08 40b5a3e systemd[1]: Failed to start Balena supervisor.
Feb 07 16:52:18 40b5a3e systemd[1]: balena-engine.socket: Failed to create listening socket (/run/balena-engine.sock): Address already in use
Feb 07 16:52:18 40b5a3e systemd[1]: balena-engine.socket: Failed to listen on sockets: Address already in use
Feb 07 16:52:18 40b5a3e systemd[1]: balena-engine.socket: Failed with result 'resources'.
Feb 07 16:52:18 40b5a3e systemd[1]: Failed to listen on Docker Socket for the API.
Feb 07 16:52:18 40b5a3e systemd[1]: Dependency failed for Balena Application Container Engine.
Feb 07 16:52:18 40b5a3e systemd[1]: balena.service: Job balena.service/start failed with result 'dependency'.
If the socket is in use, check if it is a valid socket and not a directory via ls -la /run | grep 'balena-engine'. The output may look as follows:
root@40b5a3e:~# ls -la /run | grep 'balena-engine'
drwx------ 7 root root 160 Feb 7 18:52 balena-engine
-rw-r--r-- 1 root root 7 Feb 7 18:52 balena-engine.pid
drwxr-xr-x 2 root root 40 Jan 28 20:21 balena-engine.sock
srw-rw---- 1 root balena-engine 0 Jan 13 18:07 balena-host.sock
lrwxrwxrwx 1 root root 22 Jan 13 18:07 balena.pid -> /run/balena-engine.pid
srw-rw---- 1 root balena-engine 0 Feb 7 18:52 balena.sock
lrwxrwxrwx 1 root root 22 Jan 13 18:07 docker.pid -> /run/balena-engine.pid
Judging by the above output, you could see that balena-engine.sock is a directory. This is the source of the bug since the OS cannot recover from this. To fix it so the socket can be recreated correctly, you might have to restart the engine with systemctl restart balena.
Hope that helps. To avoid future instances of this issue, we advise that you upgrade to Supervisor v14.0.0+. Please let us know the results of any of the above points ^ (assuming the info matches your case)