Frequently stuck in 'Build in progress'

vpetersson · February 22, 2023, 4:00pm

We’ve had a fair bit of issues lately with fleets getting stuck in Build in progress.

For instance, our pi3 and pi4 fleets for Anthias has been stuck in this state for some time now, preventing new builds to be pushed out (as the deploy job will timeout).

Here’s a snippet from the errors we’re getting in the CI/CD pipeline.

 Warning:                Failed to generate deltas due to an internal error; will be generated on-demand
[...]
[Info]                  Uploading images
[Success]               Successfully uploaded images
[Info]                  Still Working...
[Info]                  Still Working...
[Info]                  Still Working...
[Info]                  Still Working...
[Info]                  Still Working...
[Info]                  Still Working...
[Info]                  Still Working...
[Info]                  Still Working...
[Info]                  Still Working...
Error:                  Upstream API server/DB error: ESOCKETTIMEDOUT
[Info]                  Built on arm02
Error:                  Not deploying release.
Error: Remote build failed

vpetersson · February 23, 2023, 10:00am

Our last ‘push’ to the fleet was 17 hours ago, but our fleet is still stuck in this state:

vpetersson · February 27, 2023, 7:15am

Our Pi 3 fleet is still stuck in this mode.

Bump and maybe ping @maggie0002.

maggie0002 · February 28, 2023, 3:35pm

@vpetersson, I’m afraid I am no longer working at Balena. You could try @mpous.

vpetersson · March 1, 2023, 6:39pm

I’m getting desperate here. I’ve tried deploying manually with the below, but still the same issue:

$ balena deploy screenly_ose/anthias-pi3 --nocache --pull --debug                                                                                                                                                               [debug] new argv=[/home/user/tmp/balena-cli/balena,/snapshot/balena-cli/bin/balena,deploy,screenly_ose/anthias-pi3,--nocache,--pull] length=6
[debug] Deprecation check: 0.00944 days since last npm registry query for next major version release date.
[debug] Will not query the registry again until at least 7 days have passed.
[debug] Event tracking error: Timeout awaiting 'response' for 0ms
[Debug]   Parsing input...
[Debug]   Loading project...
[Debug]   Resolving project...
[Debug]   docker-compose.yml file found at "/home/user/code/screenly/Anthias/balena-deploy"
[Debug]   Creating project...
[Info]    Everything is up to date (use --build to force a rebuild)
[Info]    Creating release...
[Debug]   Tagging images...
[Debug]   Authorizing push...
[Debug]   Requesting access to previously pushed image repo (v2/fb71f9552da59f25bce04f3b26aeb6d8)
[Debug]   Requesting access to previously pushed image repo (v2/d851218eac10063e9b2753d01f20f363)
[Debug]   Requesting access to previously pushed image repo (v2/29c5ae4e4f516a76bc24265d89fe201b)
[Debug]   Requesting access to previously pushed image repo (v2/c33d5c0474d5a81e2c73a23be9cf7186)
[Debug]   Requesting access to previously pushed image repo (v2/816dba22d660097f477660a40e62793c)
[Debug]   Requesting access to previously pushed image repo (v2/8af23d532bb44a181eb52332834bfd09)
[Debug]   Requesting access to previously pushed image repo (v2/8ee514cf79f984b6a03b7e7ee9443649)
[Info]    Pushing images to registry...
[Debug]   Saving image registry2.balena-cloud.com/v2/06b63e81e92f17bfeb0a735c66c5ebe1
[Debug]   Saving image registry2.balena-cloud.com/v2/fabdbd700759a26b6006025441a92066
[Debug]   Saving image registry2.balena-cloud.com/v2/10bb10539edd83db90ee9feac3916742
[Debug]   Saving image registry2.balena-cloud.com/v2/94d5d671503f18a7944239c5ba41c1d5
[Debug]   Saving image registry2.balena-cloud.com/v2/1638bdd46a638ea9e7effbde72f07605
[Debug]   Saving image registry2.balena-cloud.com/v2/ac386219bb88e3377221650af54db559
[Debug]   Saving image registry2.balena-cloud.com/v2/578bbbf613ceb4c8004ab96b46aa4d0c
[Debug]   Untagging images...
[Info]    Saving release...
[Error]   Deploy failed
ESOCKETTIMEDOUT: ESOCKETTIMEDOUT

Error: ESOCKETTIMEDOUT
    at ClientRequest.<anonymous> (/snapshot/balena-cli/node_modules/request/request.js:816:19)
    at Object.onceWrapper (events.js:519:28)
    at ClientRequest.emit (events.js:400:28)
    at ClientRequest.emit (domain.js:475:12)
    at TLSSocket.emitRequestTimeout (_http_client.js:790:9)
    at Object.onceWrapper (events.js:519:28)
    at TLSSocket.emit (events.js:412:35)
    at TLSSocket.emit (domain.js:475:12)
    at TLSSocket.Socket._onTimeout (net.js:495:8)
    at listOnTimeout (internal/timers.js:557:17)
    at processTimers (internal/timers.js:500:7)

For further help or support, visit:
https://www.balena.io/docs/reference/balena-cli/#support-faq-and-troubleshooting

I’m pretty certain that this is a server side balena issue.

mpous · March 2, 2023, 10:50am

@vpetersson was this issue fixed finally?

vpetersson · March 6, 2023, 8:25am

In short, the problem was the Balena’s worker got stuck somehow and stuck in ‘Failed’ mode and showing ‘Build in Progress’ (see earlier screenshots).

The workaround for this was to instead of using balena deploy [...], use balena push [fleet] --draft and subsequently promote the release in the Balena UI.

This “unstuck” Balena and allowed us to publish subsequent releases.

We’ll look into automating this in our build flow as our deploy method instead.

mpous · March 6, 2023, 11:00am

Thanks for sharing the workaround @vpetersson

Let us know if you could automatize this!

nicomiguelino · July 5, 2024, 6:38pm

@mpous, we’ve experienced the similar issue again. I tried the solution provided by @vpetersson, but it didn’t work anymore. (I tried it on my local machine.)

Even on my local machine running balena push is stuch at “Still working…”.

The last thing that I’d do (I’m not 100% sure if it will work) is to delete those releases stuck at the “Running” state.

mpous · July 8, 2024, 9:34am

Hello @nicomiguelino

Could you please share more details?

What is the device type that you are using?
Could you please share the logs?
What CLI version do you have?

Thanks

byteminer · July 8, 2024, 10:12am

Looks like I’m not the only one with an issue like this, it seems. I’m encountering a similar problem and have reported this in the CLI repo before, but have unfortunately not gotten a reply in the past 3 weeks.
I also suffer from timeouts when trying to build via balena push and they leave builds stuck in the ‘Running’ state in the web interface. This has been a consistent issue for months and hasn’t gotten better, so I figure I’d throw my hat in the ring to maybe add an extra datapoint here.
It’s quite frustrating when you want to push out a release and end up delaying by a day or more just because Balena’s build system doesn’t work again.

nicomiguelino · July 8, 2024, 6:08pm

@mpous,

What is the device type that you are using?
- The issue happened on Anthias’ CI workflows, when attempting to deploy fleets to Pi 2, Pi 3 and Pi 4 devices.
Could you please share the logs? Here are the logs from our CI runs:
- chore: renames screenly-host-agent to anthias-host-agent (#1957) · Screenly/Anthias@876ed0c · GitHub
- Merge pull request #1936 from Screenly/dependabot/pip/requirements/ur… · Screenly/Anthias@5bb4123 · GitHub
What CLI version do you have?
- We’re using the following action in our workflow: balena-io/deploy-to-balena-action@master
- The Balena CLI in my local machine is 18.1.5, where I got similar issues when I tried to deploy manually.

mpous · July 9, 2024, 8:12am

@nicomiguelino could you please let me know what are you trying to deploy? i would like to try to reproduce!

@byteminer thanks for reporting on the CLI repo! The balena team is exploring this and we will keep you posted once. Having said that, could you please share more details of what are you trying to deploy?

byteminer · July 9, 2024, 10:22am

I’ll try to create a reproducible example that doesn’t require an NDA to publish, will post here once I have something. I strongly suspect this is an issue with large multi-container builds as I’m building things for Jetson, which necessarily pulls in all of the Nvidia drivers for every container that makes use of the GPU. This quickly leads to multi-container setups that reach 40+ GB in size.
My theory at the moment is that the Balena builders time out when trying to pull the cache images, which then propagates through to the CLI and ends up in a non-informative timeout error.
I have since figured out that I can increase the chance of a build going through by reducing the number of containers (even if the containers I remove from the setup are very small), so maybe it has something to do with the container count as well.

mpous · July 9, 2024, 10:28am

@byteminer the balena engineers will contact with you through the balenaCLI issue!

Thanks for sharing!

nicomiguelino · July 10, 2024, 4:49pm

@mpous, I’m trying to deploy changes to Anthias via GitHub actions. Details about the workflow could be found here: Anthias/.github/workflows/docker-build.yaml at master · Screenly/Anthias · GitHub.

At the moment, deployment of Docker images to our Anthias Balena fleets takes some time. Sometimes the issue occurs. Sometimes, it doesn’t.

mpous · July 11, 2024, 9:54am

Thanks for sharing @nicomiguelino we are working on this currently!

Topic		Replies	Views
Unable to deploy to my fleets Product support raspberrypi4	5	20	February 6, 2025
Many failed builds over the last 24 hours Product support builder	1	242	July 27, 2023
Release stuck building, never finishes, backing up releases balenaOS	6	353	October 14, 2021
Infinite Building/Updating RPi3 Product support	17	450	January 31, 2020
Balena unit that is stuck on update, additionally container as EXITED and is not startable Product support support , raspberrypi3	3	248	April 5, 2023

Frequently stuck in 'Build in progress'

Related topics