Device type: Raspberry Pi (v1 / Zero / Zero W)
OS version: balenaOS 2.46.1+rev1
Supervisor version: 10.6.27
Hi, We have had several instances where the device stops responding. It still shows up as online but we cannot connect to it and it seems that at least one of the containers is no longer functioning.
After having the customer power cycle the device, we look at the previous boot journal via
journalctl -b -1 and can see:
Apr 02 16:54:55 3bff1dd balenad: Failed to start containerd: timeout waiting for containerd to start Apr 02 16:54:56 3bff1dd resin-supervisor: Cannot connect to the balenaEngine daemon at unix:///var/run/balena-engine.sock. Is the balenaEngine daemon running? Apr 02 16:54:56 3bff1dd systemd: balena.service: Main process exited, code=exited, status=1/FAILURE Apr 02 16:54:56 3bff1dd systemd: balena.service: Failed with result 'exit-code'. Apr 02 16:54:57 3bff1dd wpa_supplicant: wlan0: CTRL-EVENT-SUBNET-STATUS-UPDATE status=0 Apr 02 16:54:56 3bff1dd systemd: Failed to start Balena Application Container Engine. Apr 02 16:54:58 3bff1dd systemd: resin-supervisor.service: Control process exited, code=exited, status=3/NOTIMPLEMENTED Apr 02 16:54:58 3bff1dd systemd: resin-supervisor.service: Failed with result 'exit-code'. Apr 02 16:54:58 3bff1dd systemd: Failed to start Balena supervisor. Apr 02 16:54:59 3bff1dd resin-supervisor: activating
This repeats. It’s not often viable to have to ask customer to restart so looking for ideas on how to debug this and make the production more stable with our containers.