Devices unable to update: HTTP 500 net/http: TLS handshake timeout

Hi,
I’ve raised already several tickets in the dashboard but so far no solution coming up.
We have a lot of devices failing to update and we are getting a lot of errors like this:

Failed to download image 'registry2.balena-cloud.com/v2/af8c17fa46bfd1bc4a72e05715f28cc3@sha256:4dc03d950b951042e017a0afd1852c5ff339f7e911677a71da2c734daf8a0281' due to '(HTTP code 500) server error - Get https://registry2.balena-cloud.com/v2/v2/af8c17fa46bfd1bc4a72e05715f28cc3/manifests/sha256:4dc03d950b951042e017a0afd1852c5ff339f7e911677a71da2c734daf8a0281: Get https://api.balena-cloud.com/auth/v1/token?account=d_XXXXXHIDEFORFORUMSf52bc808f327ea4c08d&scope=repository%3Av2%2Faf8c17fa46bfd1bc4a72e05715f28cc3%3Apull&service=registry2.balena-cloud.com: net/http: TLS handshake timeout '
Downloading image 'registry2.balena-cloud.com/v2/af8c17fa46bfd1bc4a72e05715f28cc3@sha256:4dc03d950b951042e017a0afd1852c5ff339f7e911677a71da2c734daf8a0281'
Failed to download image 'registry2.balena-cloud.com/v2/af8c17fa46bfd1bc4a72e05715f28cc3@sha256:4dc03d950b951042e017a0afd1852c5ff339f7e911677a71da2c734daf8a0281' due to 'could not get decompression stream: Get https://registry2.balena-cloud.com/v2/v2/af8c17fa46bfd1bc4a72e05715f28cc3/blobs/sha256:16dc06fc37e62b8bb995edc59cf4d9d55cb4bd9a4f5682035ae0bf338ebd4f86: Get https://api.balena-cloud.com/auth/v1/token?account=d_XXXXXHIDEFORFORUMSf52bc808f327ea4c08d&scope=repository%3Av2%2Faf8c17fa46bfd1bc4a72e05715f28cc3%3Apull&service=registry2.balena-cloud.com: net/http: request canceled (Client.Timeout exceeded while awaiting headers)'
Downloading image 'registry2.balena-cloud.com/v2/af8c17fa46bfd1bc4a72e05715f28cc3@sha256:4dc03d950b951042e017a0afd1852c5ff339f7e911677a71da2c734daf8a0281'
Failed to download image 'registry2.balena-cloud.com/v2/af8c17fa46bfd1bc4a72e05715f28cc3@sha256:4dc03d950b951042e017a0afd1852c5ff339f7e911677a71da2c734daf8a0281' due to '(HTTP code 500) server error - Get https://registry2.balena-cloud.com/v2/v2/af8c17fa46bfd1bc4a72e05715f28cc3/manifests/sha256:4dc03d950b951042e017a0afd1852c5ff339f7e911677a71da2c734daf8a0281: Get https://api.balena-cloud.com/auth/v1/token?account=d_XXXXXHIDEFORFORUMSf52bc808f327ea4c08d&scope=repository%3Av2%2Faf8c17fa46bfd1bc4a72e05715f28cc3%3Apull&service=registry2.balena-cloud.com: net/http: TLS handshake timeout '
Failed to download image 'registry2.balena-cloud.com/v2/af8c17fa46bfd1bc4a72e05715f28cc3@sha256:4dc03d950b951042e017a0afd1852c5ff339f7e911677a71da2c734daf8a0281' due to '(HTTP code 500) server error - Get https://registry2.balena-cloud.com/v2/v2/af8c17fa46bfd1bc4a72e05715f28cc3/manifests/sha256:4dc03d950b951042e017a0afd1852c5ff339f7e911677a71da2c734daf8a0281: Get https://api.balena-cloud.com/auth/v1/token?account=d_XXXXXHIDEFORFORUMSf52bc808f327ea4c08d&scope=repository%3Av2%2Faf8c17fa46bfd1bc4a72e05715f28cc3%3Apull&service=registry2.balena-cloud.com: net/http: TLS handshake timeout '
Downloading image 'registry2.balena-cloud.com/v2/af8c17fa46bfd1bc4a72e05715f28cc3@sha256:4dc03d950b951042e017a0afd1852c5ff339f7e911677a71da2c734daf8a0281'
Failed to download image 'registry2.balena-cloud.com/v2/af8c17fa46bfd1bc4a72e05715f28cc3@sha256:4dc03d950b951042e017a0afd1852c5ff339f7e911677a71da2c734daf8a0281' due to 'could not get decompression stream: Get https://registry2.balena-cloud.com/v2/v2/af8c17fa46bfd1bc4a72e05715f28cc3/blobs/sha256:a5800a273f46bce7c8cc4654ca0c62d25479f47eac71115091643692d5164239: net/http: TLS handshake timeout'
Downloading image 'registry2.balena-cloud.com/v2/af8c17fa46bfd1bc4a72e05715f28cc3@sha256:4dc03d950b951042e017a0afd1852c5ff339f7e911677a71da2c734daf8a0281'
Failed to download image 'registry2.balena-cloud.com/v2/af8c17fa46bfd1bc4a72e05715f28cc3@sha256:4dc03d950b951042e017a0afd1852c5ff339f7e911677a71da2c734daf8a0281' due to '(HTTP code 500) server error - Get https://registry2.balena-cloud.com/v2/v2/af8c17fa46bfd1bc4a72e05715f28cc3/manifests/sha256:4dc03d950b951042e017a0afd1852c5ff339f7e911677a71da2c734daf8a0281: net/http: TLS handshake timeout '
Downloading image 'registry2.balena-cloud.com/v2/af8c17fa46bfd1bc4a72e05715f28cc3@sha256:4dc03d950b951042e017a0afd1852c5ff339f7e911677a71da2c734daf8a0281'
Failed to download image 'registry2.balena-cloud.com/v2/af8c17fa46bfd1bc4a72e05715f28cc3@sha256:4dc03d950b951042e017a0afd1852c5ff339f7e911677a71da2c734daf8a0281' due to '(HTTP code 500) server error - Get https://registry2.balena-cloud.com/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) '
Downloading image 'registry2.balena-cloud.com/v2/af8c17fa46bfd1bc4a72e05715f28cc3@sha256:4dc03d950b951042e017a0afd1852c5ff339f7e911677a71da2c734daf8a0281'
Failed to download image 'registry2.balena-cloud.com/v2/af8c17fa46bfd1bc4a72e05715f28cc3@sha256:4dc03d950b951042e017a0afd1852c5ff339f7e911677a71da2c734daf8a0281' due to 'error pulling image configuration: Get https://registry2.balena-cloud.com/v2/v2/af8c17fa46bfd1bc4a72e05715f28cc3/blobs/sha256:804a8930c6cd594b2cf8aa6ce946b3570003ad6d409ce9d8bffc0ea54105b79d: net/http: TLS handshake timeout'
Downloading image 'registry2.balena-cloud.com/v2/af8c17fa46bfd1bc4a72e05715f28cc3@sha256:4dc03d950b951042e017a0afd1852c5ff339f7e911677a71da2c734daf8a0281'
Failed to download image 'registry2.balena-cloud.com/v2/af8c17fa46bfd1bc4a72e05715f28cc3@sha256:4dc03d950b951042e017a0afd1852c5ff339f7e911677a71da2c734daf8a0281' due to '(HTTP code 500) server error - Get https://registry2.balena-cloud.com/v2/: net/http: TLS handshake timeout '
Downloading image 'registry2.balena-cloud.com/v2/af8c17fa46bfd1bc4a72e05715f28cc3@sha256:4dc03d950b951042e017a0afd1852c5ff339f7e911677a71da2c734daf8a0281'
Failed to download image 'registry2.balena-cloud.com/v2/af8c17fa46bfd1bc4a72e05715f28cc3@sha256:4dc03d950b951042e017a0afd1852c5ff339f7e911677a71da2c734daf8a0281' due to '(HTTP code 500) server error - Get https://registry2.balena-cloud.com/v2/v2/af8c17fa46bfd1bc4a72e05715f28cc3/manifests/sha256:4dc03d950b951042e017a0afd1852c5ff339f7e911677a71da2c734daf8a0281: net/http: TLS handshake timeout '

devices are downloading the image but stopping at a certain percentage, then some time idle, then starting over. This consumes hundreds of MB of data but devices are stuck. Sometimes they recover after one day or so but this is currently unacceptable behaviour when we are shipping the devices to our customers and they “don’t work on arrival”.

Is there anyone else having these issues? Any solution?
This is burning a lot of money in our company as devices are installed by field technicians on-site and they can’t finish the installation procedure because devices won’t update and we need to send them again that costs a lot of money!

Thanks
Fritz

Hi Fritz, I’m going to give an opportunity to the community to answer too. If more people are seeing this issue regularly that might give us new information as all that we can tell until now is this looks as a transient network issue.

Given what you tell us about the cost of recalling and resending devices, it might be useful to consider image preloading and pinning devices to release as that might help with that particular issue since the first pull is usually the largest as it occurs without deltas.

I know this is not a solution for all cases, but it might help you with that immediate problem.

Please let us know if this helps you.

Hi @pipex
thanks! Yes, we are already preloading the devices properly. But at the moment of the shipping, there might be already a new version out and devices fail to update.
Hopefully someone else has simliare experiences and can share the details.
What has been collected so far in the tickets is that it might be related to IPv6 and not properly configured IPv6 across ISPs and routers that’s why requests time out.
Disabling IPv6 helped most of the time but had to be done manually on each device directly on the supervisor.

Thanks