I am trying to deploy the same code (identical git hash) to two Applications. The first one worked, multi container deployment, everything is nice, The second one doesn’t/isn’t. There are some environment variables different between the two applications, but the only other significant difference I can see are that the connected devices are on different wifi networks. Of the failing application, one device is on a rather poor VDSL connection and the other one has a reasonably good VDSL connection.
The device with the poor connection is failing with a lot of logs like this:
10.11.18 10:57:28 (+1100) Downloading image 'registry2.balena-cloud.com/v2/aa5a2db06d3e56b4cf0da4cc64a7090f@sha256:98ee3a2a1bba61e2978e75c9d77a628152b6e88cb8d7dde89df7eebb6bdfb839'
10.11.18 10:57:55 (+1100) Failed to download image 'registry2.balena-cloud.com/v2/aa5a2db06d3e56b4cf0da4cc64a7090f@sha256:98ee3a2a1bba61e2978e75c9d77a628152b6e88cb8d7dde89df7eebb6bdfb839' due to '(HTTP code 500) server error - Get https://registry2.balena-cloud.com/v2/v2/aa5a2db06d3e56b4cf0da4cc64a7090f/manifests/sha256:98ee3a2a1bba61e2978e75c9d77a628152b6e88cb8d7dde89df7eebb6bdfb839: net/http: TLS handshake timeout '
And, predictably, the image downloads keep restarting. Sometimes the download restarts when the dashboard reaches 100% (same error), sometimes it’s part-way through download when it fails. Usually the image downloads seem to fail independently, but occasionally all 7 images fail in synchrony.
The failing device with a faster internet connection has all 7 images download, but it only starts one successfully and two keep restarting (postgres and redis) with complaints about missing files. The other 4 containers are not even trying to restart (status “Downloaded”), but that makes sense given the dependency tree between containers.
I’ve tried waiting patiently, rebooting, restarting, and poking buttons randomly. I’m not sure what else to try… can anyone give me some hints?
Well the Application with only good connection is still OK. The Application with one bad and one mediocre connection (totally different networks, in different parts of town) have now both downloaded all container images (the bad one took another 9 hours). However, they are failing to start their containers (postgres and redis are stuck in loops), and the containers that depend on them are not starting at all.
Postgres is looping like this:
10.11.18 20:17:03 (+1100) postgres LOG: skipping missing configuration file "/var/lib/postgresql/data/postgresql.auto.conf"
10.11.18 20:17:03 (+1100) postgres postgres: could not find the database system
10.11.18 20:17:03 (+1100) postgres Expected to find it in the directory "/var/lib/postgresql/data",
10.11.18 20:17:03 (+1100) postgres but could not open file "/var/lib/postgresql/data/global/pg_control": No such file or directory
And the redis starts then exist like this
10.11.18 20:19:05 (+1100) redis 1:C 10 Nov 09:19:05.026 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf
10.11.18 20:19:05 (+1100) redis 1:M 10 Nov 09:19:05.032 # Warning: 32 bit instance detected but no memory limit set. Setting 3 GB maxmemory limit with 'noeviction' policy now.
10.11.18 20:19:05 (+1100) redis _._
10.11.18 20:19:05 (+1100) redis _.-``__ ''-._
10.11.18 20:19:05 (+1100) redis _.-`` `. `_. ''-._ Redis 3.2.12 (00000000/0) 32 bit
10.11.18 20:19:05 (+1100) redis .-`` .-```. ```\/ _.,_ ''-._
10.11.18 20:19:05 (+1100) redis ( ' , .-` | `, ) Running in standalone mode
10.11.18 20:19:05 (+1100) redis |`-._`-...-` __...-.``-._|'` _.-'| Port: 6379
10.11.18 20:19:05 (+1100) redis | `-._ `._ / _.-' | PID: 1
10.11.18 20:19:05 (+1100) redis `-._ `-._ `-./ _.-' _.-'
10.11.18 20:19:05 (+1100) redis |`-._`-._ `-.__.-' _.-'_.-'|
10.11.18 20:19:05 (+1100) redis | `-._`-._ _.-'_.-' | http://redis.io
10.11.18 20:19:05 (+1100) redis `-._ `-._`-.__.-'_.-' _.-'
10.11.18 20:19:05 (+1100) redis |`-._`-._ `-.__.-' _.-'_.-'|
10.11.18 20:19:05 (+1100) redis | `-._`-._ _.-'_.-' |
10.11.18 20:19:05 (+1100) redis `-._ `-._`-.__.-'_.-' _.-'
10.11.18 20:19:05 (+1100) redis `-._ `-.__.-' _.-'
10.11.18 20:19:05 (+1100) redis `-._ _.-'
10.11.18 20:19:05 (+1100) redis `-.__.-'
10.11.18 20:19:05 (+1100) redis
10.11.18 20:19:05 (+1100) redis 1:M 10 Nov 09:19:05.034 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
10.11.18 20:19:05 (+1100) redis 1:M 10 Nov 09:19:05.035 # Server started, Redis version 3.2.12
10.11.18 20:19:05 (+1100) redis 1:M 10 Nov 09:19:05.039 * DB loaded from disk: 0.004 seconds
10.11.18 20:19:05 (+1100) redis 1:M 10 Nov 09:19:05.039 * The server is now ready to accept connections on port 6379
10.11.18 20:19:06 (+1100) redis 1:signal-handler (1541841546) Received SIGTERM scheduling shutdown...
10.11.18 20:19:06 (+1100) redis 1:M 10 Nov 09:19:06.244 # User requested shutdown...
10.11.18 20:19:06 (+1100) redis 1:M 10 Nov 09:19:06.244 * Saving the final RDB snapshot before exiting.
I don’t understand these loops. The docker-compose for containers is simple: