Balena push command consistently timing out or failing

Hi,

I am trying to build new releases and deploy them to my devices but all my attempts to run balena push have been failing. I run the balena push command in a CI process, and the CI process has a time limit of 2 hours. It has regularly been either hitting that time limit, or failing with an error. When it hits the time limit, the balena push command often shows 0% progress after 2 hours. Previously this would happen on a semi regular basis (say once very couple of days). Now this has been happening consistently for for ~3 days.

I am pushing 9 service images. Two are large (~4.5Gb and ~5.5Gb). Our build process works by building docker iamges, pushing them to AWS ECR, then creating a balena release docker-compse file that references the images directly in ECR. So the build processing on the Balena build servers should just be pulling these images from ECR and creating the release in Balena.

I am running my build processing in Australia (AWS Sydney region), and we use AWS ECR in that region. So I suspected having the Balena build servers (based in us-east-1 I think?) pull an image from the Sydney region might be slow and causing the issue. I have recently tried ECR container replication, and having the Balena build agents pull from us-east-1 ECR, but that doesn’t seem to have helped.

Some examples of the errors I’ve received:

Example 1:

Remote builder responded with HTTP error:
502 Bad Gateway
:police_car_light: Error: The command exited with status 1
user command error: exit status 1

Example 2:

[] [=========================================================] 100%

[Info] Starting build for xxx, user xxx

[Info] Dashboard link: xxx

[Info] Building on e8ba311

[Info] Pulling previous images for caching purposes…

[Success] Successfully pulled cache images

ECONNRESET: aborted [==========================================> ] 72%

Example of a timeout:

[Info] Still Working…

[Info] Still Working…

[Info] Still Working…

[Info] Still Working…

[Info] Still Working…

[Info] Still Working…

[Info] Still Working…

[Info] Still Working…

[Info] Still Working…

[Info] Still Working…

[Info] Still Working…

[Info] Still Working…

[service-name-placeholder] [> ] 0%

[service-name-placeholder] [> ] 0%

[service-name-placeholder] [> ] 0%

[service-name-placeholder] [> ] 0%

[service-name-placeholder] [> ] 0%

[service-name-placeholder] [> ] 0%

[service-name-placeholder] [> ] 0%

[service-name-placeholder] [> ] 0%

# Received cancellation signal, interrupting ] 0%

Any help is greatly appreciated.

Thanks

We’ve made some back end adjustments recently that may address this issue. Are you still experiencing these errors?

Hi, thanks for your reply.

The problem is not solved. I’ve tried rebuilding twice over the last 2 days, and both times the progress has got to about 20% in 4 hours, and then the CI job has timed out. I am still not able to publish any releases.

Thanks for your help, Seb.