Hi,
I recently updated the OS of one of our field devices to the latest OS version 2.48, and then updated our application. It seemed that everything went fine, but then I ran the diagnostics and got two errors:
Container engine balena is up, but has 1 unclean restarts and may be crashlooping (most recent start time: Tue 2020-05-19 18:35:27 UTC)
Saving images timed out, check data incomplete: 0ccc272d385f cc42f1e9307e
I believe the second always happens when the engine has issues.
Looking into the error logs
root@ea6ccbb:~# journalctl --unit balena.service --no-pager | grep error
May 19 15:11:45 ea6ccbb balenad[882]: time="2020-05-19T15:11:45.241374328Z" level=warning msg="failed to load plugin io.containerd.internal.v1.opt" error="mkdir /opt: read-only file system"
May 19 18:33:13 ea6ccbb balenad[882]: time="2020-05-19T18:33:13.014384074Z" level=error msg="a36a9850d27b41886df6afae9afa4a2f6b78fc6479699e3530f06436921ab2ff cleanup: failed to delete container from containerd: no such container"
May 19 18:33:13 ea6ccbb balenad[882]: time="2020-05-19T18:33:13.215879164Z" level=error msg="Handler for POST /v1.39/containers/a36a9850d27b41886df6afae9afa4a2f6b78fc6479699e3530f06436921ab2ff/start returned error: OCI runtime start failed: container process is already dead: unknown"
May 19 18:35:26 ea6ccbb balenad[882]: error 0x0
May 19 18:35:32 ea6ccbb balenad[6914]: time="2020-05-19T18:35:32.670617638Z" level=warning msg="failed to load plugin io.containerd.internal.v1.opt" error="mkdir /opt: read-only file system"
May 19 18:35:38 ea6ccbb 3064e329d44e[6914]: [event] Event: Image download error {"error":{"message":"connect ECONNREFUSED /var/run/balena-engine.sock","stack":"Error: connect ECONNREFUSED /var/run/balena-engine.sock\n at PipeConnectWrap.afterConnect [as oncomplete] (net.js:1097:14)"},"image":{"name":"registry2.balena-cloud.com/v2/199149ff1febbecdbd25aa4fa3430f87@sha256:a918255d4d17079316e9cbc45af1441f0fbcf51618d2fea246701749aab42cc9","appId":1273333,"serviceId":239090,"serviceName":"solmon","imageId":2171655,"releaseId":1342867,"dependent":0,"dockerImageId":null}}
May 19 18:35:38 ea6ccbb 3064e329d44e[6914]: [event] Event: Device state report failure {"error":{"message":"connect ECONNREFUSED /var/run/balena-engine.sock","stack":"Error: connect ECONNREFUSED /var/run/balena-engine.sock\n at PipeConnectWrap.afterConnect [as oncomplete] (net.js:1097:14)"}}
May 19 18:35:38 ea6ccbb 3064e329d44e[6914]: [error] Scheduling another update attempt in 1000ms due to failure: Error: connect ECONNREFUSED /var/run/balena-engine.sock
May 19 18:35:38 ea6ccbb 3064e329d44e[6914]: [error] at PipeConnectWrap.afterConnect [as oncomplete] (net.js:1097:14)
May 19 18:35:38 ea6ccbb 3064e329d44e[6914]: [error] Device state apply error Error: connect ECONNREFUSED /var/run/balena-engine.sock
… repeated many times
and then later
May 19 18:36:26 ea6ccbb balenad[6914]: time="2020-05-19T18:36:26.030744760Z" level=warning msg="error locating sandbox id 5f7760cf3cc8d95da79f8bcbc0cee5e149cd2e2044740656eae172d2b3bdc3a7: sandbox 5f7760cf3cc8d95da79f8bcbc0cee5e149cd2e2044740656eae172d2b3bdc3a7 not found"
May 19 18:36:30 ea6ccbb balenad[6914]: time="2020-05-19T18:36:30.890081824Z" level=warning msg="unmount task rootfs" error="no such file or directory" id=3064e329d44ea8164f764b54f5bdc13e77ab7001a30357c9b862dece6e708f32 path=/var/run/balena-engine/containerd/daemon/io.containerd.runtime.v1.linux/moby/3064e329d44ea8164f764b54f5bdc13e77ab7001a30357c9b862dece6e708f32
May 19 18:36:30 ea6ccbb balenad[6914]: time="2020-05-19T18:36:30.909994734Z" level=error msg="delete bundle" error="Failed to remove both bundle and workdir locations: rename /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/3064e329d44ea8164f764b54f5bdc13e77ab7001a30357c9b862dece6e708f32 /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/.3064e329d44ea8164f764b54f5bdc13e77ab7001a30357c9b862dece6e708f32: no such file or directory: rename /var/run/balena-engine/containerd/daemon/io.containerd.runtime.v1.linux/moby/3064e329d44ea8164f764b54f5bdc13e77ab7001a30357c9b862dece6e708f32 /var/run/balena-engine/containerd/daemon/io.containerd.runtime.v1.linux/moby/.3064e329d44ea8164f764b54f5bdc13e77ab7001a30357c9b862dece6e708f32: no such file or directory"
May 19 20:16:02 ea6ccbb balenad[6914]: time="2020-05-19T20:16:02.929304875Z" level=warning msg="error locating sandbox id e1a2203150018d902ad1824548959a6b1831d824049dba82854168eeddf60e59: sandbox e1a2203150018d902ad1824548959a6b1831d824049dba82854168eeddf60e59 not found"
May 19 20:16:06 ea6ccbb balenad[6914]: time="2020-05-19T20:16:06.660753031Z" level=warning msg="unmount task rootfs" error="no such file or directory" id=f8b46322ee3f003d9391429cceeec4d8482442a7c34b1eab9c10e629c91188a7 path=/var/run/balena-engine/containerd/daemon/io.containerd.runtime.v1.linux/moby/f8b46322ee3f003d9391429cceeec4d8482442a7c34b1eab9c10e629c91188a7
May 19 20:16:06 ea6ccbb balenad[6914]: time="2020-05-19T20:16:06.681130939Z" level=error msg="delete bundle" error="Failed to remove both bundle and workdir locations: rename /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/f8b46322ee3f003d9391429cceeec4d8482442a7c34b1eab9c10e629c91188a7 /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/.f8b46322ee3f003d9391429cceeec4d8482442a7c34b1eab9c10e629c91188a7: no such file or directory: rename /var/run/balena-engine/containerd/daemon/io.containerd.runtime.v1.linux/moby/f8b46322ee3f003d9391429cceeec4d8482442a7c34b1eab9c10e629c91188a7 /var/run/balena-engine/containerd/daemon/io.containerd.runtime.v1.linux/moby/.f8b46322ee3f003d9391429cceeec4d8482442a7c34b1eab9c10e629c91188a7: no such file or directory"
May 19 20:32:01 ea6ccbb balenad[6914]: time="2020-05-19T20:32:01.953071497Z" level=error msg="Can't add file /var/lib/docker/tmp/docker-export-610895789/1b4817bdbc01b72a913735288434a993f35c8cbc206a5632bdb20466cbb003fd/layer.tar to tar: io: read/write on closed pipe"
May 19 20:32:01 ea6ccbb balenad[6914]: time="2020-05-19T20:32:01.959058461Z" level=error msg="Can't close tar writer: io: read/write on closed pipe"
May 19 20:36:22 ea6ccbb balenad[6914]: time="2020-05-19T20:36:22.048232467Z" level=error msg="Can't add file /var/lib/docker/tmp/docker-export-473862662/1fb4db48c9d6acea64de14855f1cda9f0fef512bec80d785044f41e6e827017e/layer.tar to tar: io: read/write on closed pipe"
May 19 20:36:22 ea6ccbb balenad[6914]: time="2020-05-19T20:36:22.053704435Z" level=error msg="Can't close tar writer: io: read/write on closed pipe"
As far as I can tell, everything is working ok. and balena ps
seems to show that the containers are running. My guess is that there was an issue with the engine but somehow it recovered?
I’m interested in understanding these errors - what do they mean and are they indicative of a real problem?
- the container which is missing prior to delete
- the issue with connecting to the balena-engine socket
- the various files missing in the second set of errors
- errors in writing to one of the docker layers b/c of closed pipe
I see container engine issues quite often in our deployed states but despite the excellent masterclass documentation don’t have enough of a mental model of the system.