Persistent "Failed to download image due to 'connect ECONNREFUSED /var/run/balena-engine.sock" error

ajlennon · April 23, 2021, 10:40am

Thanks - but I tried that and it errors purging - see above

ajlennon · May 3, 2021, 6:08pm

Still getting this happening. No real idea why it happens or how to fix it. Tried deleting /mnt/data but to no avail

ECONNREFUSED /var/run/balena-engine.sock

ajlennon · May 3, 2021, 7:20pm

So I just downloaded a clean app image and wrote it to flash with Etcher. It is failing to download the app container with this messaging

ajlennon · May 3, 2021, 9:33pm

I’ve been looking at this and I think I have some ideas. I suspect one kind of failure may be OOM on the Pi0. Another kind seems to be watchdog related.

I’ve changed the service watchdog to 60 minutes instead of 6 minutes and that seems to fix it

It looks like the balena daemon is spawning the untar process which is quite slow on mmc for large images. I suspec that watchdog responses are being blocked during untar. Maybe this needs to be looked at? Perhaps putting the responses in a thread whilst the untar operates.

mpous · May 13, 2021, 8:49am

Thank you @ajlennon for your research! We are going to check internally to understand if this is a potential solution!

Let’s stay connected

ajlennon · May 13, 2021, 10:48am

Any time bug bounty beers cough cough

robertgzr · May 17, 2021, 9:58am

Hello, engine maintainer here.

We’ve been looking into ways to optimize this. I know this is an issue on lower spec devices, most notably the pi zero, where it can actually cause the situation you experienced here.
As you observed there’s some unfortunate behaviour where the engine’s healthcheck itself actually sets off the watchdog on an otherwise functioning device. My idea actually goes towards setting up a better
healthcheck that is more delicate than our current one, which tries to run the hello-world container.

We ran some tests in the past which adjusted the priorities of the untar operations, which actually already run as a completely separate process. If I remember correctly that didn’t entirely solve the issue.

robertgzr · May 17, 2021, 10:01am

we have an issue for this if you are interested to follow that: Healthdog checks time out during image pulls · Issue #196 · balena-os/balena-engine · GitHub

as I said I’m going to look into coming up with a better healthcheck, since it would most likely improve this situation as well as other issues we’ve had around that

robertgzr · May 17, 2021, 10:03am

could you share the OS version you are on? just for reference?

ajlennon · August 15, 2021, 11:54am

Sorry only just seen this !!! I was working on balenaOS 2.54.2+rev1

frederikheld · April 13, 2022, 10:00pm

@robertgzr Any progress on this? I have the issue with a RasPi Zero W and balenaOS 2.83.21+rev1, Supervisor 12.10.3

Someone mentioned that their /mnt/data partition was full. Would using a bigger SD card fix the issue? Or is this only related to the slow processor of the Pi Zero?

Are there any hacky fixes that work?

Topic		Replies	Views
Failed to download image due to 'connect ECONNREFUSED /var/run/balena-engine.sock' Product support	10	1639	May 11, 2021
Balena Sound Error balenaOS	1	445	April 7, 2020
Unable to install Services balenaSound	0	246	January 29, 2022
Balena Sound Error!!!! General	1	421	May 13, 2020
[Error] All container images fail to download (to a specific device) Product support raspberrypi3	9	747	August 1, 2019

Persistent "Failed to download image due to 'connect ECONNREFUSED /var/run/balena-engine.sock" error

Related topics