Balena engine start failure

0xff · April 3, 2020, 10:19pm

Device type: Raspberry Pi (v1 / Zero / Zero W)
OS version: balenaOS 2.46.1+rev1
Supervisor version: 10.6.27

Hi, We have had several instances where the device stops responding. It still shows up as online but we cannot connect to it and it seems that at least one of the containers is no longer functioning.

After having the customer power cycle the device, we look at the previous boot journal via journalctl -b -1 and can see:

Apr 02 16:54:55 3bff1dd balenad[1390]: Failed to start containerd: timeout waiting for containerd to start
Apr 02 16:54:56 3bff1dd resin-supervisor[1379]: Cannot connect to the balenaEngine daemon at unix:///var/run/balena-engine.sock. Is the balenaEngine daemon running?
Apr 02 16:54:56 3bff1dd systemd[1]: balena.service: Main process exited, code=exited, status=1/FAILURE
Apr 02 16:54:56 3bff1dd systemd[1]: balena.service: Failed with result 'exit-code'.
Apr 02 16:54:57 3bff1dd wpa_supplicant[876]: wlan0: CTRL-EVENT-SUBNET-STATUS-UPDATE status=0
Apr 02 16:54:56 3bff1dd systemd[1]: Failed to start Balena Application Container Engine.
Apr 02 16:54:58 3bff1dd systemd[1]: resin-supervisor.service: Control process exited, code=exited, status=3/NOTIMPLEMENTED
Apr 02 16:54:58 3bff1dd systemd[1]: resin-supervisor.service: Failed with result 'exit-code'.
Apr 02 16:54:58 3bff1dd systemd[1]: Failed to start Balena supervisor.
Apr 02 16:54:59 3bff1dd resin-supervisor[1875]: activating

This repeats. It’s not often viable to have to ask customer to restart so looking for ideas on how to debug this and make the production more stable with our containers.

Thanks.

dtischler · April 3, 2020, 10:43pm

Hi there, sorry to hear about these troubles. The original Raspberry Pi 1 and the Pi Zero don’t have a lot of compute power due to their older, single core processor, so that may be the culprit here if you are running multiple containers, and, those containers are running even moderate workloads.

With that said, could you share the output of ‘journalctl -u balena.service -t balenad’ ? That will give us a bit more visibility into what is occurring. Thanks!

0xff · April 4, 2020, 3:14pm

Thanks for the response. It happens sporadically and unfortunately we have lost logs for that instance so I will capture the output next time it happens.

One question: when I issue “top” to look at resource usage, I see that CPU usage is rarely greater than 40%, and memory usage is about 90% (i.e. 10% free). Is that indicative of resource constraints? I have not seen out of memory errors in the logs. Also, is a single-service installation more efficient resource-wise than a multi-service one? If so we can do some experiments to see if we see this when running a single container.

Thanks

JSReds · April 6, 2020, 1:06pm

Hi there,
yes, multi service requires more resources than single since more resources will be spent to get multiple things concurrently.
So given the low computing power of the device, I would suggest you to try with a single container where possible.

Topic		Replies	Views
Balena Engine won't start balenaOS	10	835	June 24, 2020
Balena Engine won't start after Host OS update Product support raspberrypi4	12	1023	May 15, 2023
Cannot connect to the balenaEngine daemon at unix:///var/run/balena-engine.sock. Is the balenaEngine daemon running? Product support raspberrypi4 , balenacloud	2	553	June 10, 2023
Balena engine not starting due to socket busy error Product support raspberrypi4	5	1161	March 18, 2021
Container quit and won't restart Product support	3	792	April 15, 2020

Balena engine start failure

Related topics