From what I could gather this error might be related to the specific image being pushed. If you are still experiencing this issue, and retrying the push does not seem to solve this, maybe it’s worth trying to make a small change in the image and pushing that instead. Sorry for the tentative advice, we will keep looking into this until we find a better solution, but in the meantime let me know if this workaround is enough to unblock you.
Yeah, retrying does it eventually so it’s not a complete block but, at anywhere from 10-30m build times depending on what changes have been made, that’s a fair amount of time to lose for no reason each time…
We are experiencing this issue again at the moment, here are some logs from a recent build which failed when building on
arm01. Let me know if you would like any more information?
[camera] Step 1/27 : FROM balenalib/raspberrypi3-python:3.7-stretch-build [camera] Cannot overwrite digest sha256:7b1984c9147243654e4a53e76b0ff55f0513000881c3d27db14286b7d9c537a7 ... [Success] Successfully uploaded images [Error] Some services failed to build: [Error] Service: camera [Error] Error: Cannot overwrite digest sha256:7b1984c9147243654e4a53e76b0ff55f0513000881c3d27db14286b7d9c537a7 [Info] Built on arm01 [Error] Not deploying release. Remote build failed
@nazrhom is there a status page where we be kept updated with the current builder issues? The current Balena status page suggests that all systems are operational.
Hey, after some research on this, it seems to be a race condition in the docker runtime. I’ve an idea of how we could solve it until it’s fixed upstream (https://github.com/docker/for-linux/issues/727) but I wanted to do some information gathering first.
Do you have multiple services in your docker-compose? Do at least 2 of these services use the same base image?
We have 8 services and a number of services use the same base image. Would you suggest that we fix the base images to an exact build and make the services use images which are ever so slightly different?
So it appears that the problem is due to docker pulling the same images at the same time. What I propose is to use slightly different base images for your services for now if possible, and if that works please let us know. I have an idea about how to fix this until it’s fixed upstream (pre-fetch all base images before the builds occur to ensure that we’re not issuing a pull for the same image), but I’d like to know that this solves your issue.
As a little bit of context, we’ve been investigating what kind of effect moving to a more elastic architecture for a build system would look like in terms of build times, as we would have to always pull base images, whereas in the past we’ve cached these. To investigate this, we’ve enable a mode on our builders which will always remove base images after a build, and measure the difference in time that builds take with or without the cache. I think this is why you’re now seeing this problem. If changing the base images does fix things for you, we’ll revert the change for the time being and implement the one I described above before changing the mode again.
Let me know if you have any further questions, and I look forward to hearing if your problems are solved with the base image change!
Just as an aside, I’m about to leave for the day, but I’d really like to maintain the cohesiveness of this ticket and will be taking over on monday. If you have any questions in the meantime feel free to add them to this thread and one of my colleagues will answer.
Hey there, we’re running in the same issue on our 10-12 services app. Was there any progress on this issue @CameronDiver?
I’d like to make sure your issue has the same root cause with the issues the rest of the users are facing above. Could you please let us know more about your fleet, the devices you are using, and the balenaOS and supervisor version?
It would also be helpful if you share the entire builder logs here. You can run the following command
balena push yourApp > logs.txt
to save the outcome locally and attach the logs here.
Ideally, you can create a new forum thread and include the entire information there, so that we can pick up your case and work closely on that.
@erwan I’ve been monitoring the issue to see how widespread it is, but as of yet, no changes have been made. Would it be possible for you to try what I mentioned above, by slightly changing the base images? I understand it could be a pain due to the amount of services you have, I would just like to avoid adding a patch for this without first knowing if it would help or not.
I have also been experiencing this on and off for months now and as @krenom mentioned retrying takes up a significant amount time.
This is on a multicontainer application and I have granted support for a week if you would like to access the device.
Release hash: 2da32b42a512d10eb4bbd1093439d82d
Build Log carl-test_release-log_build overview-2da32b4.log (1.9 KB)
CLI Log tasklog_11.log (758.9 KB)
Hi there, our engineers are aware of this issue and are working on this.
Actually a fix for this problem comes through this PR https://github.com/moby/moby/pull/37781 and is available upstream since v19.03.0. We are currently running v18.9.7 in our ARM builders, so this should be fixed as soon as we move to the new arch because we’ll be updating the engine version. I’m afraid I cannot share an ETA with you regarding the resolution, but we are going to notify you as soon as we have it updated.
Thanks for bearing with us,
Fixed two years ago?
While you “cannot share an ETA”, can you provide some kind of range that’s not an order of magnitude away from the possible answer, given that this has apparently been sitting around for that long?
Hey there. This is a third-party repository and this PR is providing a fix that we will need to incorporate in our coding base and ARM builders. I’m afraid I don’t have an estimation about when we are moving to the new architecture that will allow us to fix this issue, but this is for sure a priority for us.
Huh, ok, fair enough.