I have images being provisioned to Raspberry Pi 0w devices. These are based on a custom base image which is pretty big at 3GB.
Now it’s arguable I’m building too much into my custom base image, which is something I have on my TODO list to check into.
However what I see happening is that when I provision a new Raspberry Pi 0w the download takes an age (I think because the WiFi throughput on the 0w is limited, but not sure).
It seems to fail a lot and when it fails it starts downloading again from scratch.
(I’ve tried this in various different locations to check it’s not just a dodgy internet connection in one location)
Can you confirm that layers are downloaded from scratch each time a download fails? Is there anything I can to do cause the download to continue from where it failed to improve performance?
Hi Alex,
thanks for reaching out to us.
After talking to my colleagues I can tell you the following:
If the download of a layer fails in balena the layer will download from scratch.
What might be worth a try is to enable delta updates because we think that this will use a different protocol for download with a potentially better outcome and it will likely be helpful for later updates.
Regards
Thomas
Hi Alex,
after some more consultation we think there might be a issue with watchdog timers that triggers the download to fail. Can you supply us with the OS version you are using and the output of journalctl on the host OS over a relevant period ?
Otherwise you can grant us support access to the device. I can send you a PM that you can reply to with the device dashboard URL.
Regards
Thomas
Hi Alex,
It looks like this is happening due to a known bug in the version of BalenaOS you are using. The size of your download triggers this error and leads to a restart of the supervisor that also kills your download.
Your best chance to escape this situation is to update the device to a more recent version of balenaOS (e.g. 2.29.2+rev2 )where this bug has been fixed.
Regards
Thomas
You should be fine with the current latest Pi/Zero release (v2.29.2+rev1). This has the same release version of the Supervisor as +rev2. I think Thomas suggested +rev2 as that’s the latest release for the Pi3.
Please let us know if you still see the issue on v2.29.2+rev1.
It seems the device is still offline, so we cannot investigate any further. Did you try updating to a newer OS version and see if that resolves the issue? Thanks!
Hi Alex,
just looked at your device.
Update progress shows 43% and df shows:
/dev/mmcblk0p6 6.5G 4.1G 2.0G 67% /mnt/data
I am wondering if you might be running out of disk space somewhere in the process of downloading / installing images / creating containers . I will have to talk to my colleges about that and will come back to you when I know more.
Regards
Thomas
Could be. I think I’ve seen a few different failure modes. One is space, one is it just cutting the download., and one is errors downloading “missing” files
04.03.19 20:16:08 (+0000) Failed to download image 'registry2.balena-cloud.com/v2/34a4ce673f23e8c76f3660651b66a4f0@sha256:32de0df56fe85e80677d74ece8150de189715dcfb13226968008016952cc0d7a' due to '(HTTP code 404) no such image - no such image: registry2.balena-cloud.com/v2/34a4ce673f23e8c76f3660651b66a4f0@sha256:32de0df56fe85e80677d74ece8150de189715dcfb13226968008016952cc0d7a: No such image: registry2.balena-cloud.com/v2/34a4ce673f23e8c76f3660651b66a4f0@sha256:32de0df56fe85e80677d74ece8150de189715dcfb13226968008016952cc0d7a '
04.03.19 20:17:32 (+0000) Downloading image 'registry2.balena-cloud.com/v2/34a4ce673f23e8c76f3660651b66a4f0@sha256:32de0df56fe85e80677d74ece8150de189715dcfb13226968008016952cc0d7a'
04.03.19 21:56:22 (+0000) Failed to download image 'registry2.balena-cloud.com/v2/34a4ce673f23e8c76f3660651b66a4f0@sha256:32de0df56fe85e80677d74ece8150de189715dcfb13226968008016952cc0d7a' due to '(HTTP code 404) no such image - no such image: registry2.balena-cloud.com/v2/34a4ce673f23e8c76f3660651b66a4f0@sha256:32de0df56fe85e80677d74ece8150de189715dcfb13226968008016952cc0d7a: No such image: registry2.balena-cloud.com/v2/34a4ce673f23e8c76f3660651b66a4f0@sha256:32de0df56fe85e80677d74ece8150de189715dcfb13226968008016952cc0d7a '
04.03.19 21:57:35 (+0000) Downloading image 'registry2.balena-cloud.com/v2/34a4ce673f23e8c76f3660651b66a4f0@sha256:32de0df56fe85e80677d74ece8150de189715dcfb13226968008016952cc0d7a'
Hi Alex,
looks like you are looking at a combination of two errors, both of which are falsely reported as 404.
A lot of the times your download fails due to “connection reset by peer” so an ordinary network error.
In one case it looks like instead you ran out of disk space while the image was being unpacked from tar. So I guess your 3GB image (is that compressed size ?) is too big for the 4GB data partition of balenaOS.
I wonder if you could get around this problem by pre-provisioning your image. It would still have to fit on the data partition (unpacked) but it might not have to be unpacked on the device.
Take a look at https://www.balena.io/blog/advanced-device-provisioning-workflow-for-large-fleets-preloading-and-pre-provisioning/ if that looks like an option for you.
In any case you would need delta updates enabled to be able to update that image.
Regards
Thomas