I am trying to deploy the same code (identical git hash) to two Applications. The first one worked, multi container deployment, everything is nice, The second one doesn’t/isn’t. There are some environment variables different between the two applications, but the only other significant difference I can see are that the connected devices are on different wifi networks. Of the failing application, one device is on a rather poor VDSL connection and the other one has a reasonably good VDSL connection.
The device with the poor connection is failing with a lot of logs like this:
10.11.18 10:57:28 (+1100) Downloading image 'registry2.balena-cloud.com/v2/aa5a2db06d3e56b4cf0da4cc64a7090f@sha256:98ee3a2a1bba61e2978e75c9d77a628152b6e88cb8d7dde89df7eebb6bdfb839' 10.11.18 10:57:55 (+1100) Failed to download image 'registry2.balena-cloud.com/v2/aa5a2db06d3e56b4cf0da4cc64a7090f@sha256:98ee3a2a1bba61e2978e75c9d77a628152b6e88cb8d7dde89df7eebb6bdfb839' due to '(HTTP code 500) server error - Get https://registry2.balena-cloud.com/v2/v2/aa5a2db06d3e56b4cf0da4cc64a7090f/manifests/sha256:98ee3a2a1bba61e2978e75c9d77a628152b6e88cb8d7dde89df7eebb6bdfb839: net/http: TLS handshake timeout '
And, predictably, the image downloads keep restarting. Sometimes the download restarts when the dashboard reaches 100% (same error), sometimes it’s part-way through download when it fails. Usually the image downloads seem to fail independently, but occasionally all 7 images fail in synchrony.
The failing device with a faster internet connection has all 7 images download, but it only starts one successfully and two keep restarting (postgres and redis) with complaints about missing files. The other 4 containers are not even trying to restart (status “Downloaded”), but that makes sense given the dependency tree between containers.
I’ve tried waiting patiently, rebooting, restarting, and poking buttons randomly. I’m not sure what else to try… can anyone give me some hints?