Image randomly fails to download

With openBalena sometime the device is unable to download the image and when looking at the logs we see the following error:

Failed to download image '...' due to '(HTTP code 404) no such image - no such image: ...: No such image: ... '

Sometimes the download recovers automatically and all the sudden it works and sometimes it is necessary to reboot the device to start the download.

Maybe it has something to do with the registry?

Hi,

That is strange. A bad configuration could be possible. But then I’d be surprised to see downloads working and not working. Flakiness usually points to something else. And in many cases flaky Ethernet/wifi

  • Where have you deployed openBalena?
  • What device are you trying with what balenaOS version?
  • Does this happen with different devices/os versions? or just one?
  • We probably need more logging. Is this log from the device side? What does the openBalena side log look like when it works/doesn’t work?

Regards
ZubairLK

I gathered more information:

  • We have openBalena deployed on an EC2 instance
  • We are trying to update an orange pi zero
    Supervisor: 9.14.6 OS: balenaOS 2.33.0
  • This happens on multiple versions
  • The log is on the client side when entering: balena logs DEVICEUUID --tail

Balena OS backend API does not seem to throw any errors.
For versions we are currently using:

export OPENBALENA_VPN_VERSION_TAG=v8.12.4
export OPENBALENA_API_VERSION_TAG=v0.23.0
export OPENBALENA_REGISTRY_VERSION_TAG=v2.8.0
export OPENBALENA_DB_VERSION_TAG=v3.0.0
export OPENBALENA_S3_VERSION_TAG=v2.7.0

Hey,

So it sounds like the backend is instructing the Supervisor to change releases to use your new image, but either;

  • the image hasn’t been pushed yet, so it’s missing.
  • or the auth token the device is using isn’t valid for that resource, so it appears missing.

Given that the auth tokens wouldn’t change at all, I think it has to be the first scenario; the instruction to change images is happening faster that the registry can complete the upload. This is a very unusual scenario though, and not something I have personally come across during my testing with openBalena.

I will try to find some time to replicate (I have a Pi3B though, not an OrangePi) and confirm the theory, but thanks for pointing out this potentially annoying behaviour :+1:

Hi,

we just recently switched to a different update strategy. Before: download then kill now, kill then download. The new strategy seems to work much smoother for some reason and we have not seen the 404 error anymore. Not sure why. To your theories I am not sure why the download should work then when you simply unplug and replug your system.

@torben, Rich has not got to investigate further but is still going to try take a look at the problems you encountered. Meanwhile it sounds like you found a solution that wors for you ?

Before: download then kill now, kill then download. The new strategy seems to work much smoother for some reason and we have not seen the 404 error anymore. Not sure why.

Well, this sounds potentially compatible with @richbayliss’ theory that “the instruction to change images is happening faster that the registry can complete the upload.” If it is a timing issue, some sort of “racing condition” where a few seconds makes a difference, then I imagine that the “kill then download” strategy may afford the device the few extra seconds before attempting the download (it takes a bit of time to kill the containers before the download is attempted). (I am just reasoning without having actually tested or measured anything…)

While we investigate this issue, let us know in case this issue is “blocking” your app development or if you have additional findings or thoughts that might help resolving it. Thanks for reporting it!