Balena Fin can't download container images

Some of our Balena Fin-derived devices really struggle to download container images. The failure mode is to repeatedly try to download the image and fail. This behavior continues after a reboot, however internet connectivity is never lost. The error message is as below -

07.07.21 07:38:42 (-0400) Downloading image ‘registry2.balena-cloud.com/v2/98161292c06fe8bb6c1741b1496c3423@sha256:7e71540f71615073df4fad4147de38831f8de52650dda6110b6b6aa69179f208
07.07.21 07:39:15 (-0400) Failed to download image ‘registry2.balena-cloud.com/v2/98161292c06fe8bb6c1741b1496c3423@sha256:7e71540f71615073df4fad4147de38831f8de52650dda6110b6b6aa69179f208’ due to ‘error pulling image configuration: Get https://registry-data.balena-cloud.com/prod/docker/registry/v2/blobs/sha256/14/14e16e71c7d7d8585b364026e382a13e8576c97e7e0118247233e3330a0e9506/data?Expires=1625659125&Signature=lxd226m9JKqp7PVY6hXADpM2Elc5uOSYLV4SK8LU7Pt5Y7Hh5MJCYp4lVB5vML8OnXMNhpfk8jtpRz9TF9lp62h4BQPYpMZxMKAdg0x-aMhp5Xqc61gkz-uqiZonMNWt~zD-p~J21IhR9pCy71BdvBsLBFxY3xss2CgZsee7mg2xODh1eM50B7nKwrsIVYU0sTFr2QdIfSqt3OOAWNqyeFWrNpTM2NPj9~t4TjNG9Sq2tQN6CTbl9xquFIyof6mjbOk3l1lt5DBTEqGhu0RCXyJqhrnicaAS2~9LWIdZKaB9lrvufvu89X36p4OGK1207gZWOBVh74968EDbBHEraQ__&Key-Pair-Id=APKAJRCZR26VRIDKA6WQ: dial tcp: i/o timeout’
07.07.21 07:39:40 (-0400) Downloading image ‘registry2.balena-cloud.com/v2/98161292c06fe8bb6c1741b1496c3423@sha256:7e71540f71615073df4fad4147de38831f8de52650dda6110b6b6aa69179f208

Hey there, can you provide the version of balenaOS you are running on the affected devices? Do the images eventually download, or do they fail forever and get stuck? How large are the images we are talking about?

You might also want to take a look at our network requirements, just in case something is being blocked or limited: Network Setup on balenaOS 2.x - Balena Documentation

Sure - the Balena OS version is balenaOS 2.72.0+rev1. The images get about 12 or 13% downloaded then crap out. Even after multiple reboots, they never recover (although the host OS is still OK with network connectivity). The image size is 1.66GB. It seems unlikely to me that any ports are being blocked, as there are 20 other devices on the same router that are not struggling in the same way. Health check on this device shows no problems. Is there a way to manually pull the image, and/or to increase any timeouts on the connection ? - Thanks,

Hi,

Currently the timeout duration for image pulls is statically set here: balena-supervisor/images.ts at d30116217ae5e28e00e1d88233cfd742ff6db346 · balena-os/balena-supervisor · GitHub, or (2 ^ numImageFailures * 500) ms timeout, so it’s not currently configurable. Based on the error message, this looks like it could be that the engine is having trouble communicating with the balena registry. Could you paste the output of curl -v https://registry2.balena-cloud.com to see if there are networking issues accessing balena’s registry URL? As for your question about manual pull, I’ll ask internally and get back to you.

Thanks,
Christina

This is the output of the curl command you suggested -
root@penny-57:~# curl -v https://registry2.balena-cloud.com

  • Trying 54.165.213.5:443…
  • Connected to registry2.balena-cloud.com (54.165.213.5) port 443 (#0)
  • found 128 certificates in /etc/ssl/certs/ca-certificates.crt
  • ALPN, offering http/1.1
  • SSL connection using TLS1.2 / ECDHE_RSA_AES_128_GCM_SHA256
  •    server certificate verification OK
    
  •    server certificate status verification SKIPPED
    
  •    common name: balena.io (matched)
    
  •    server certificate expiration date OK
    
  •    server certificate activation date OK
    
  •    certificate public key: RSA
    
  •    certificate version: #3
    
  •    subject: CN=balena.io
    
  •    start date: Thu, 03 Sep 2020 00:00:00 GMT
    
  •    expire date: Mon, 04 Oct 2021 12:00:00 GMT
    
  •    issuer: C=US,O=Amazon,OU=Server CA 1B,CN=Amazon
    
  • ALPN, server did not agree to a protocol

GET / HTTP/1.1
Host: registry2.balena-cloud.com
User-Agent: curl/7.69.1
Accept: /

  • Mark bundle as not supporting multiuse
    < HTTP/1.1 200 OK
    < Cache-Control: no-cache
    < Date: Thu, 08 Jul 2021 20:55:50 GMT
    < Content-Length: 0
    < Connection: keep-alive
    <
  • Connection #0 to host registry2.balena-cloud.com left intact

Hi again, thank you for sharing those details. That tells us this is not a certificate issue. Could try pulling pulling this image (is a public supervisor image) manually to see if you experience the same issue?

balena pull registry2.balena-cloud.com/v2/3a9066ce744bf2ed13c472c8e827c924

The image is smaller but if we are able to replicate this it might tell us more about the problem

OK I tried this on a device that was struggling to update. This is the result - it failed in the same way.

Welcome to balenaOS

=============================================================
root@penny-79:~# balena pull registry2.balena-cloud.com/v2/3a9066ce744bf2ed13c472c8e827c924
Using default tag: latest
latest: Pulling from v2/3a9066ce744bf2ed13c472c8e827c924
8a0637ca1ac9: Pulling fs layer
eeee4030c7c2: Pull complete
5504062dcb56: Pull complete
aa96f25d39b4: Pull complete
61e3bf303ba5: Pull complete
4db4ee3949f9: Pull complete
0296a110d85a: Extracting [==================================================>] 126B/126B
8a0637ca1ac9: Extracting 236.5kB/2.723MB
eeee4030c7c2: Extracting 210.1kB/2.043MB
a670c889ec2c: Ready to download
89770e0356d3: Ready to download
c7d3ac213203: Ready to download
8af59eb0067f: Ready to download
685afd232bdc: Ready to download
b31cad3feecf: Ready to download
Total: [==================> ] 9.474MB/25.93MB
could not get decompression stream: Get https://registry2.balena-cloud.com/v2/v2/3a9066ce744bf2ed13c472c8e827c924/blobs/sha256:0296a110d85a89a172471b9771ffd2d84e5d9090973f6092c89690daf38df671: dial tcp: i/o timeout

Hi again @ko7eraven,

Thanks for performing that test. It seems like something is terminating the connection, although is hard to tell a priori if there is a network issue or something else interfering with the download.

  • What can you tell us about the network these devices are on? Is it on wifi or mobile network?
  • Do you have the same issue trying to download from other registries? Could you try balena pull ubuntu:latest?
  • Could you check the device health checks by going to the device page on the dashboard and clicking on the left side menu Diagnostics and then Run checks?

Let us know how those tests work for you. The next step if you are willing would be to enable support access and let us take a look. Thank you