New release fails with 'no such image'

Good day,

I have this recurring issue. I do a $ balena push APP, which takes a few minutes to build.

The build ends with this warning:

[inference-coral] Successfully built a627c885eda7
[Info] Generating image deltas from release 192982b732ac2cd053c512c9194b866c (id: 1367906)
[Warning] Failed to generate deltas due to an internal error; will be generated on-demand
[Info] Uploading images
[Success] Successfully uploaded images

Once the build is done I expect the devices in the application to start updating, they don’t they fail with

06.05.20 16:30:25 (+0800) Failed to download image 'registry2.balena-cloud.com/v2/a1e8cc02971a675a450669300125aa05@sha256:e4a036ac0a6c1735da5445fc1f54a12f69c251ab8f0718d749c9fcc142a3e948' due to '(HTTP code 404) no such image - no such image: registry2.balena-cloud.com/v2/a1e8cc02971a675a450669300125aa05:delta-fcb446f649842125: No such image: registry2.balena-cloud.com/v2/a1e8cc02971a675a450669300125aa05:delta-fcb446f649842125

This now happens on every push. I have to manually flick the ‘disable delta updates’ switch on each device to force a update. This is obviously not ideal.

❯ balena --version
11.32.14

TYPE
Generic x86_64 (NEW)

HOST OS VERSION
balenaOS 2.48.0+rev5
production

The 404 error you’re seeing I suspect is related to an interaction between specific kernel versions and the overlayfs storage driver in balenaEngine. Disabling deltas seems to be the only way to workaround it currently. The warning during build is innocuous and can be ignored, but I’m still curious why it consistently happens for you, so if you could paste here the application ID I may be able to take a look into it.

Ta, ID is 1647261 (PAT_QA), and I granted 1 week support access. Hope that ID is the correct one, lifted from the URL ?

That’s correct, thanks.

Hey there @sthysel,
My colleague has looked into this and it seems there might be a more general problem with deltas on the builder. We are investigating this further.

thanks for the feedback.
Cheers!

G’day,

Is there any movement on this issue ?

Thanks

Hi,

We don’t have any updates for this yet so as my colleague mentioned, disabling deltas seems to be the only way to workaround it currently.
We will let you know when we have any updates on this.

A quick update @sthysel – I’ve briefly investigated the builder issue and couldn’t resolve it but I’ll revisit soon. Regarding the issue with actually applying the delta on the device, turns out it’s indeed the issue I suspected that is at fault here and we’ll be disabling delta updates for this device type automatically until the next OS version. The changes on the backend have been tested and are waiting to be deployed. The status for the fix for balenaOS can be tracked here: https://github.com/balena-os/balena-intel/issues/300

Thanks for the feedback mate.

Can you help me undertand the bandwidth implications of this ? I assume the docker layering mechanism is not affected. So if all my services are based on the same base image I still get the bennefit of that layering. Its just deltas to existing layers that are affected ? Or are we talking full downloads of every service at each service update ?

Thanks

Hi,

Yes, you will still get the benefit of the Docker layering mechanism. You might want to pin your base image to a specific version instead of a moving tag like latest to make sure that the builder will not build on top of a new image and you will be able to maximize your existing image layers.

@sthysel without deltas, typical Docker pull semantics apply – that is, the device will download all layers from the modified Dockerfile step and down. You’ll have to be somewhat considerate to the changes you introduce if bandwidth is a concern and your images large. This will hopefully be a brief annoyance, until we introduce the fix on the next OS version.

BTW, we disabled deltas for this device type earlier today, let me know if you hit further issues getting the devices updated.

Thanks Team, that verifies expected behaviour. Yes having inefficient certain working updates is better than fast updates ‘perhaps’, thanks for the patch. Do you have a comfortable estimate on when the updated base OS fix may be available ?

Hey, just heard back from our os team. Implementing and testing the fix should take a few more days, and then for that to reach a public balenaOS release a few more days. So I’d say somewhere between one or two weeks is a safe bet provided there are no hiccups.

Hi there, I’d like to let you know that we’ve included this fix on the balenaOS release and it should work for any version >2.48. The latest nuc release you can get is 2.50.1+rev1.
Please let us know how it works for you after the update.

Georgia