sgserg
December 19, 2022, 9:49am
1
I’m facing a problematic situation which happened in two steps:
Image build didn’t use previous build results although Dockerfile didn’t change.
Could it be that the "balenalib/jetson-nano-ubuntu:bionic” image was updated recently?
This on its own did not create a problem except some wasted time.
The image Delta Size was reported to be ~86MB:
[Info] Release: 41da95ed404c4842b04a69660c336d77 (id: 162498)
[Info] ┌─────────┬────────────┬────────────┬────────────────────────┐
[Info] │ Service │ Image Size │ Delta Size │ Build Time │
[Info] ├─────────┼────────────┼────────────┼────────────────────────┤
[Info] │ main │ 6.08 GB │ 86.38 MB │ 12 minutes, 59 seconds │
[Info] └─────────┴────────────┴────────────┴────────────────────────┘
[Info] Build finished in 24 minutes, 21 seconds
But then a device in the affected fleet failed to update despite having almost 3GB of free disk space.
What could cause such a problem?
What can we do to avoid it in the future?
balenaOS 2.107.5
SUPERVISOR VERSION 14.4.2
OS Variant: Development
1 Like
mpous
December 19, 2022, 10:33am
3
Hello @sgserg welcome to the balena community!
Could you please check on the Logs
what is the error message from the supervisor?
Sometimes if the source image is not available (e.g image of the device as you anticipated) , the supervisor will pull the full release.
sgserg
December 19, 2022, 11:07am
5
Thank you @mpous glad to be here!
Here is what I see in the supervisor Logs section:
Failed to download image ‘registry2.balena-staging.com/v2/6d0456a13111854f952954e0e6c7d6e5@sha256:718948de5d0358e5ad4ac7e90c0256ce787ed5946b09e7303c135ba2e8ff0019 ’ due to ‘failed to register layer: Error processing tar file(exit status 1): write /usr/local/cuda-10.2/targets/aarch64-linux/lib/libcusolver.so.10.3.0.300: no space left on device’
Downloading delta for image ‘registry2.balena-staging.com/v2/6d0456a13111854f952954e0e6c7d6e5@sha256:718948de5d0358e5ad4ac7e90c0256ce787ed5946b09e7303c135ba2e8ff0019 ’
I also monitored the disk usage during update and confirmed it got full somwehere around 50% of the download. After that the download restarts.
1 Like
mpous
December 21, 2022, 11:26am
6
Hello @sgserg after speaking with my colleagues, we have an hypothesis of what is happening.
If the base image has changed, the delta will need enough disk space to store all the layers of the base image that have been updated. If this is the case, try to allow delete-then-download
update (at the expense of downtime and bandwith).
You can read more about the balena update strategies here Fleet update strategy - Balena Documentation
Let us know if that solves the problem!
sgserg
December 21, 2022, 5:26pm
8
@mpous by base image you mean the image we use in the first line of Dockerfile?
FROM balenalib/jetson-nano-ubuntu:bionic
sgserg
December 23, 2022, 11:45am
9
Yes, freezing the base image to an earlier date and dropping the “failed” release appears to have helped with using the previously built image.
But for some reason, after the build the device has been stuck in “Delta still processing remotely. Will retry…” for more than 24 hours already.
[main] Successfully built 3b539becd932
[Info] Uploading images
[Success] Successfully uploaded images
[Info] Built on arm02
[Success] Release successfully created!
[Info] Release: 6cc679d7d05662afb06fc764e10eb12a (id: 162670)
[Info] ┌─────────┬────────────┬────────────┐
[Info] │ Service │ Image Size │ Build Time │
[Info] ├─────────┼────────────┼────────────┤
[Info] │ main │ 6.09 GB │ 53 seconds │
[Info] └─────────┴────────────┴────────────┘
[Info] Build finished in 4 minutes, 2 seconds
1 Like
@sgserg did the delta finally complete? If not, have you tried the “Delete then download” strategy suggested earlier?
sgserg
December 27, 2022, 7:37pm
11
@alanb128 It didn’t. We had to turn it off after a couple of days.
We have another device in similar state, if you’d like to take a look.
Sure, glad to take a look. Are you seeing the same error messages such as “Delta still processing remotely. Will retry…” ? If you can attach or send us the device diagnostics that may help.
sgserg
December 28, 2022, 8:06am
13
Yes, the same messages.
system.log (6.8 MB)
Please find attached output of journalctl --system
1 Like