Jetson Nano emmc fails to update with a small delta

sgserg · December 19, 2022, 9:49am

I’m facing a problematic situation which happened in two steps:

Image build didn’t use previous build results although Dockerfile didn’t change.

Could it be that the "balenalib/jetson-nano-ubuntu:bionic” image was updated recently?

This on its own did not create a problem except some wasted time.

The image Delta Size was reported to be ~86MB:

[Info] Release: 41da95ed404c4842b04a69660c336d77 (id: 162498)

[Info] ┌─────────┬────────────┬────────────┬────────────────────────┐

[Info] │ Service │ Image Size │ Delta Size │ Build Time │

[Info] ├─────────┼────────────┼────────────┼────────────────────────┤

[Info] │ main │ 6.08 GB │ 86.38 MB │ 12 minutes, 59 seconds │

[Info] └─────────┴────────────┴────────────┴────────────────────────┘

[Info] Build finished in 24 minutes, 21 seconds

But then a device in the affected fleet failed to update despite having almost 3GB of free disk space.

What could cause such a problem?

What can we do to avoid it in the future?

balenaOS 2.107.5
SUPERVISOR VERSION 14.4.2
OS Variant: Development

mpous · December 19, 2022, 10:33am

Hello @sgserg welcome to the balena community!

Could you please check on the Logs what is the error message from the supervisor?

Sometimes if the source image is not available (e.g image of the device as you anticipated) , the supervisor will pull the full release.

sgserg · December 19, 2022, 11:07am

Thank you @mpous glad to be here!

Here is what I see in the supervisor Logs section:

Failed to download image ‘registry2.balena-staging.com/v2/6d0456a13111854f952954e0e6c7d6e5@sha256:718948de5d0358e5ad4ac7e90c0256ce787ed5946b09e7303c135ba2e8ff0019’ due to ‘failed to register layer: Error processing tar file(exit status 1): write /usr/local/cuda-10.2/targets/aarch64-linux/lib/libcusolver.so.10.3.0.300: no space left on device’
Downloading delta for image ‘registry2.balena-staging.com/v2/6d0456a13111854f952954e0e6c7d6e5@sha256:718948de5d0358e5ad4ac7e90c0256ce787ed5946b09e7303c135ba2e8ff0019’

I also monitored the disk usage during update and confirmed it got full somwehere around 50% of the download. After that the download restarts.

mpous · December 21, 2022, 11:26am

Hello @sgserg after speaking with my colleagues, we have an hypothesis of what is happening.

If the base image has changed, the delta will need enough disk space to store all the layers of the base image that have been updated. If this is the case, try to allow delete-then-download update (at the expense of downtime and bandwith).

You can read more about the balena update strategies here Fleet update strategy - Balena Documentation

Let us know if that solves the problem!

sgserg · December 21, 2022, 5:26pm

@mpous by base image you mean the image we use in the first line of Dockerfile?

FROM balenalib/jetson-nano-ubuntu:bionic

sgserg · December 23, 2022, 11:45am

Yes, freezing the base image to an earlier date and dropping the “failed” release appears to have helped with using the previously built image.

But for some reason, after the build the device has been stuck in “Delta still processing remotely. Will retry…” for more than 24 hours already.

[main]     Successfully built 3b539becd932
[Info]     Uploading images
[Success]  Successfully uploaded images
[Info]     Built on arm02
[Success]  Release successfully created!
[Info]     Release: 6cc679d7d05662afb06fc764e10eb12a (id: 162670)
[Info]     ┌─────────┬────────────┬────────────┐
[Info]     │ Service │ Image Size │ Build Time │
[Info]     ├─────────┼────────────┼────────────┤
[Info]     │ main    │ 6.09 GB    │ 53 seconds │
[Info]     └─────────┴────────────┴────────────┘
[Info]     Build finished in 4 minutes, 2 seconds

alanb128 · December 26, 2022, 7:44pm

@sgserg did the delta finally complete? If not, have you tried the “Delete then download” strategy suggested earlier?

sgserg · December 27, 2022, 7:37pm

@alanb128 It didn’t. We had to turn it off after a couple of days.
We have another device in similar state, if you’d like to take a look.

alanb128 · December 28, 2022, 1:35am

Sure, glad to take a look. Are you seeing the same error messages such as “Delta still processing remotely. Will retry…” ? If you can attach or send us the device diagnostics that may help.

sgserg · December 28, 2022, 8:06am

Yes, the same messages.
system.log (6.8 MB)
Please find attached output of journalctl --system

drcnyc · March 19, 2024, 3:56am

@mpous @alanb128 Sorry to bump this after over a year but we just ran into this same issue, and I was wondering if there is another workaround other than the “delete then download” method which would result in downtime.

We recently pushed an update to our fleet which inadvertently included an updated balenalib/raspberrypi4-64-debian:bookworm image, and now we are having devices run out of storage as they try to apply the delta updates. I’m assuming it is related to this comment above:

If the base image has changed, the delta will need enough disk space to store all the layers of the base image that have been updated. If this is the case, try to allow delete-then-download update (at the expense of downtime and bandwith).

If we kill and delete the old containers and let them redownload from scratch, the devices download the images fine - which I guess is in essence the same as applying a “delete then download” strategy… but I’m hoping there might be a better way?

And separately, mostly out of curiosity, but also because I’d like to see if I can help - what is the reason that balena-engine is not able to handle deltas appropriately when base images change? It seems to work great otherwise…

mpous · March 20, 2024, 10:48am

@drcnyc could you please confirm if you are running this device on openBalena or balenaCloud?

drcnyc · March 20, 2024, 11:28am

@mpous we are running openbalena with a delta server that uses balena-engine to create the deltas. While I know that potentially introduces a number of other variables, I believe this is the issue we are seeing.

mpous · March 20, 2024, 12:05pm

@drcnyc could you please confirm what delta server are you using? did you implement one yourselves?

If this is not working, I think the device needs to have enough storage to apply the image… i’m not sure how we can help you here.

drcnyc · March 20, 2024, 12:47pm

@mpous it is our own open source delta server, which is based on ‘balena-engine’ as detailed here and here.

Could I ask it a different way - is the issue noted above still present in balena cloud? (i.e. if base images change, does that require devices to download new base images plus changed layers)? And if so, is there some kind of fundamental reason why this can’t be handled differently / can I help resolve it?

It feels like a fairly significant issue, because people who use delta updates rely on them being small, and I suspect most won’t appreciate that you need to leave 2x your fleet size in storage headroom on your device just in case base images change otherwise the update will get stuck. Especially when it seems that the base images do change from time to time, the debian one I noted above was just changed two weeks ago but retained the same tag - so even if you were “pinned” to that image, it would have changed, and would necessitate the storage headroom.

sgserg · March 24, 2024, 9:05am

We ended up referencing image with its creation date:

FROM balenalib/jetson-nano-ubuntu:bionic-20221109

drcnyc · March 24, 2024, 1:43pm

@sgserg thank you for sharing that - we were planning to reference the hash but this is a better approach.

Topic		Replies	Views
updating fails jetson emmc Product support docker , jetson	19	966	April 15, 2021
Failed to download image due to 'connect ECONNREFUSED /var/run/balena-engine.sock' Product support	10	1639	May 11, 2021
not using binary delta images Product support	0	164	January 18, 2023
Failed to download image; no space left on device, jetson agx xavier Product support jetson , xavier	21	3356	February 22, 2021
Failing to download image delta: no such file or directory. Broken image delta. Product support jetson	7	1492	March 22, 2024

Jetson Nano emmc fails to update with a small delta

Related topics