Issues downloading images to boards

Hi all,

I have images being provisioned to Raspberry Pi 0w devices. These are based on a custom base image which is pretty big at 3GB.

Now it’s arguable I’m building too much into my custom base image, which is something I have on my TODO list to check into.

However what I see happening is that when I provision a new Raspberry Pi 0w the download takes an age (I think because the WiFi throughput on the 0w is limited, but not sure).

It seems to fail a lot and when it fails it starts downloading again from scratch.

(I’ve tried this in various different locations to check it’s not just a dodgy internet connection in one location)

19.02.19 10:15:21 (+0000) Logs disconnected, reconnecting...
19.02.19 10:15:22 (+0000) Reconnected.
19.02.19 11:31:21 (+0000) Downloading image 'registry2.balena-cloud.com/v2/34a4ce673f23e8c76f3660651b66a4f0@sha256:32de0df56fe85e80677d74ece8150de189715dcfb13226968008016952cc0d7a'
19.02.19 13:19:27 (+0000) Downloading image 'registry2.balena-cloud.com/v2/34a4ce673f23e8c76f3660651b66a4f0@sha256:32de0df56fe85e80677d74ece8150de189715dcfb13226968008016952cc0d7a'
19.02.19 15:00:12 (+0000) Downloading image 'registry2.balena-cloud.com/v2/34a4ce673f23e8c76f3660651b66a4f0@sha256:32de0df56fe85e80677d74ece8150de189715dcfb13226968008016952cc0d7a'

Can you confirm that layers are downloaded from scratch each time a download fails? Is there anything I can to do cause the download to continue from where it failed to improve performance?

Thanks!

Alex

Hi Alex,
thanks for reaching out to us.
After talking to my colleagues I can tell you the following:
If the download of a layer fails in balena the layer will download from scratch.
What might be worth a try is to enable delta updates because we think that this will use a different protocol for download with a potentially better outcome and it will likely be helpful for later updates.
Regards
Thomas

Hi Alex,
after some more consultation we think there might be a issue with watchdog timers that triggers the download to fail. Can you supply us with the OS version you are using and the output of journalctl on the host OS over a relevant period ?
Otherwise you can grant us support access to the device. I can send you a PM that you can reply to with the device dashboard URL.
Regards
Thomas

Thanks for coming back to me on this Thomas.

It’s still stuck in the update cycle. I’ve not made the delta change given the update message.

I’ve just granted support access. If you need UIDs just drop me a PM letting me know what’s needed.

Cheers!

Alex

Hi Alex,
It looks like this is happening due to a known bug in the version of BalenaOS you are using. The size of your download triggers this error and leads to a restart of the supervisor that also kills your download.
Your best chance to escape this situation is to update the device to a more recent version of balenaOS (e.g. 2.29.2+rev2 )where this bug has been fixed.
Regards
Thomas

Ah great news - I was pretty sure I downloaded the latest and greatest image today though.

Checking on the application now the recommended version for Pi0w is v2.29.2+rev1 ?

Thanks,

Alex

Hi there - when do you think rev2 might be available for the Pi0w ?

Thanks!

Alex

Hi @ajlennon

You should be fine with the current latest Pi/Zero release (v2.29.2+rev1). This has the same release version of the Supervisor as +rev2. I think Thomas suggested +rev2 as that’s the latest release for the Pi3.

Please let us know if you still see the issue on v2.29.2+rev1.

Best regards, Heds

Please let us know if you still see the issue on v2.29.2+rev1 .

Yes I see the issue with rev 1

Thanks for letting us know. Is this the same device that was showing the issue previously?

If not, could you please send me a PM with the new device URL of the device running 2.29.2+rev1 that is failing, and I’ll investigate for you.

Thanks!

Done!

Hello @ajlennon
It seems that your device is offline. Could you bring it back online?
Thanks!

Hi @ajlennon,

It seems the device is still offline, so we cannot investigate any further. Did you try updating to a newer OS version and see if that resolves the issue? Thanks!

Hi there - I’ll get it back online asap - thanks

Hi guys,

Apologies for the delay. It’s online and updating now as e37e1022ca68ae18091f5b166cf2a38c

Thanks!

Alex

Hi Alex,
just looked at your device.
Update progress shows 43% and df shows:

/dev/mmcblk0p6                     6.5G  4.1G  2.0G  67% /mnt/data

I am wondering if you might be running out of disk space somewhere in the process of downloading / installing images / creating containers . I will have to talk to my colleges about that and will come back to you when I know more.
Regards
Thomas

Could be. I think I’ve seen a few different failure modes. One is space, one is it just cutting the download., and one is errors downloading “missing” files

04.03.19 20:16:08 (+0000) Failed to download image 'registry2.balena-cloud.com/v2/34a4ce673f23e8c76f3660651b66a4f0@sha256:32de0df56fe85e80677d74ece8150de189715dcfb13226968008016952cc0d7a' due to '(HTTP code 404) no such image - no such image: registry2.balena-cloud.com/v2/34a4ce673f23e8c76f3660651b66a4f0@sha256:32de0df56fe85e80677d74ece8150de189715dcfb13226968008016952cc0d7a: No such image: registry2.balena-cloud.com/v2/34a4ce673f23e8c76f3660651b66a4f0@sha256:32de0df56fe85e80677d74ece8150de189715dcfb13226968008016952cc0d7a '
04.03.19 20:17:32 (+0000) Downloading image 'registry2.balena-cloud.com/v2/34a4ce673f23e8c76f3660651b66a4f0@sha256:32de0df56fe85e80677d74ece8150de189715dcfb13226968008016952cc0d7a'
04.03.19 21:56:22 (+0000) Failed to download image 'registry2.balena-cloud.com/v2/34a4ce673f23e8c76f3660651b66a4f0@sha256:32de0df56fe85e80677d74ece8150de189715dcfb13226968008016952cc0d7a' due to '(HTTP code 404) no such image - no such image: registry2.balena-cloud.com/v2/34a4ce673f23e8c76f3660651b66a4f0@sha256:32de0df56fe85e80677d74ece8150de189715dcfb13226968008016952cc0d7a: No such image: registry2.balena-cloud.com/v2/34a4ce673f23e8c76f3660651b66a4f0@sha256:32de0df56fe85e80677d74ece8150de189715dcfb13226968008016952cc0d7a '
04.03.19 21:57:35 (+0000) Downloading image 'registry2.balena-cloud.com/v2/34a4ce673f23e8c76f3660651b66a4f0@sha256:32de0df56fe85e80677d74ece8150de189715dcfb13226968008016952cc0d7a'

Hi Alex,
looks like you are looking at a combination of two errors, both of which are falsely reported as 404.
A lot of the times your download fails due to “connection reset by peer” so an ordinary network error.
In one case it looks like instead you ran out of disk space while the image was being unpacked from tar. So I guess your 3GB image (is that compressed size ?) is too big for the 4GB data partition of balenaOS.
I wonder if you could get around this problem by pre-provisioning your image. It would still have to fit on the data partition (unpacked) but it might not have to be unpacked on the device.
Take a look at https://www.balena.io/blog/advanced-device-provisioning-workflow-for-large-fleets-preloading-and-pre-provisioning/ if that looks like an option for you.
In any case you would need delta updates enabled to be able to update that image.
Regards
Thomas

Thanks - I see I’ve got an 8GB card in there.

I’m sure I’ve used bigger uSD cards and had the same problem but what I’ll do here is to move over to a 16GB/32GB card and retest.

Thanks!