Container quit and won't restart

Device type: Raspberry Pi (v1 / Zero / Zero W)
OS version: balenaOS 2.46.1+rev1
Supervisor version: 10.6.27
Device id: 3bff1ddc7ac06d29cb9ad624e750921b

The main container quit. When we were finally able to log into the device host, we see some errors via journalctl on balena.service. I see there is another user that had similar error (Services are in a constant restart loop!). Errors below:

Apr 13 22:39:00 3bff1dd balenad[17710]: time="2020-04-13T22:39:00.182065030Z" level=info msg="shim balena-engine-containerd-shim started" address=/containerd-shim/moby/2c9026cbe671f8ee16a0e279bc4ea50cdb79290fc3c8e4e9438c75f4bae1ea08/shim.sock debug=true pid=29066
Apr 13 22:39:00 3bff1dd balenad[17710]: time="2020-04-13T22:39:00.938978531Z" level=debug msg="registering ttrpc server"
Apr 13 22:39:00 3bff1dd balenad[17710]: time="2020-04-13T22:39:00.942138503Z" level=debug msg="serving api on unix socket" socket="[inherited from parent]"
Apr 13 22:39:02 3bff1dd balenad[17710]: time="2020-04-13T22:39:02.181608860Z" level=info msg="shim reaped" id=2c9026cbe671f8ee16a0e279bc4ea50cdb79290fc3c8e4e9438c75f4bae1ea08
Apr 13 22:39:02 3bff1dd balenad[17710]: time="2020-04-13T22:39:02.200022702Z" level=error msg="stream copy error: reading from a closed fifo"
Apr 13 22:39:02 3bff1dd balenad[17710]: time="2020-04-13T22:39:02.245246314Z" level=debug msg="event published" ns=moby topic=/containers/delete type=containerd.events.ContainerDelete
Apr 13 22:39:02 3bff1dd balenad[17710]: time="2020-04-13T22:39:02.448485568Z" level=error msg="2c9026cbe671f8ee16a0e279bc4ea50cdb79290fc3c8e4e9438c75f4bae1ea08 cleanup: failed to delete container from containerd: no such container"
Apr 13 22:39:02 3bff1dd balenad[17710]: time="2020-04-13T22:39:02.451322544Z" level=error msg="Handler for POST /containers/2c9026cbe671f8ee16a0e279bc4ea50cdb79290fc3c8e4e9438c75f4bae1ea08/start returned error: OCI runtime create failed: container with id exists: 2c9026cbe671f8ee16a0e279bc4ea50cdb79290fc3c8e4e9438c75f4bae1ea08: unknown"
Apr 13 22:39:02 3bff1dd c114c62dc8c3[17710]: [error]   Scheduling another update attempt in 900000ms due to failure:  Error: Failed to apply state transition steps. (HTTP code 500) server error - OCI runtime create failed: container with id exists: 2c9026cbe671f8ee16a0e279bc4ea50cdb79290fc3c8e4e9438c75f4bae1ea08: unknown  Steps:["start"]
Apr 13 22:39:02 3bff1dd c114c62dc8c3[17710]: [error]         at /usr/src/app/dist/app.js:614:16375
Apr 13 22:39:02 3bff1dd c114c62dc8c3[17710]: [error]       at c (/usr/src/app/dist/app.js:9:77523)
Apr 13 22:39:02 3bff1dd c114c62dc8c3[17710]: [error]       at O._settlePromiseFromHandler (/usr/src/app/dist/app.js:312:325441)
Apr 13 22:39:02 3bff1dd c114c62dc8c3[17710]: [error]       at O._settlePromise (/usr/src/app/dist/app.js:312:326241)
Apr 13 22:39:02 3bff1dd c114c62dc8c3[17710]: [error]       at O._settlePromise0 (/usr/src/app/dist/app.js:312:326940)
Apr 13 22:39:02 3bff1dd c114c62dc8c3[17710]: [error]       at O._settlePromises (/usr/src/app/dist/app.js:312:328181)
Apr 13 22:39:02 3bff1dd c114c62dc8c3[17710]: [error]       at d (/usr/src/app/dist/app.js:312:329886)
Apr 13 22:39:02 3bff1dd c114c62dc8c3[17710]: [error]       at p (/usr/src/app/dist/app.js:312:329825)
Apr 13 22:39:02 3bff1dd c114c62dc8c3[17710]: [error]       at s._drainQueues (/usr/src/app/dist/app.js:312:331345)
Apr 13 22:39:02 3bff1dd c114c62dc8c3[17710]: [error]       at Immediate.drainQueues [as _onImmediate] (/usr/src/app/dist/app.js:312:329567)
Apr 13 22:39:02 3bff1dd c114c62dc8c3[17710]: [error]       at runCallback (timers.js:705:18)
Apr 13 22:39:02 3bff1dd c114c62dc8c3[17710]: [error]       at tryOnImmediate (timers.js:676:5)
Apr 13 22:39:02 3bff1dd c114c62dc8c3[17710]: [error]       at processImmediate (timers.js:658:5)
Apr 13 22:39:02 3bff1dd c114c62dc8c3[17710]: [error]   Device state apply error Error: Failed to apply state transition steps. (HTTP code 500) server error - OCI runtime create failed: container with id exists: 2c9026cbe671f8ee16a0e279bc4ea50cdb79290fc3c8e4e9438c75f4bae1ea08: unknown  Steps:["start"]
Apr 13 22:39:02 3bff1dd c114c62dc8c3[17710]: [error]         at /usr/src/app/dist/app.js:614:16375
Apr 13 22:39:02 3bff1dd c114c62dc8c3[17710]: [error]       at c (/usr/src/app/dist/app.js:9:77523)
Apr 13 22:39:02 3bff1dd c114c62dc8c3[17710]: [error]       at O._settlePromiseFromHandler (/usr/src/app/dist/app.js:312:325441)
Apr 13 22:39:02 3bff1dd c114c62dc8c3[17710]: [error]       at O._settlePromise (/usr/src/app/dist/app.js:312:326241)
Apr 13 22:39:02 3bff1dd c114c62dc8c3[17710]: [error]       at O._settlePromise0 (/usr/src/app/dist/app.js:312:326940)
Apr 13 22:39:02 3bff1dd c114c62dc8c3[17710]: [error]       at O._settlePromises (/usr/src/app/dist/app.js:312:328181)
Apr 13 22:39:02 3bff1dd c114c62dc8c3[17710]: [error]       at d (/usr/src/app/dist/app.js:312:329886)
Apr 13 22:39:02 3bff1dd c114c62dc8c3[17710]: [error]       at p (/usr/src/app/dist/app.js:312:329825)
Apr 13 22:39:02 3bff1dd c114c62dc8c3[17710]: [error]       at s._drainQueues (/usr/src/app/dist/app.js:312:331345)
Apr 13 22:39:02 3bff1dd c114c62dc8c3[17710]: [error]       at Immediate.drainQueues [as _onImmediate] (/usr/src/app/dist/app.js:312:329567)
Apr 13 22:39:02 3bff1dd c114c62dc8c3[17710]: [error]       at runCallback (timers.js:705:18)
Apr 13 22:39:02 3bff1dd c114c62dc8c3[17710]: [error]       at tryOnImmediate (timers.js:676:5)
Apr 13 22:39:02 3bff1dd c114c62dc8c3[17710]: [error]       at processImmediate (timers.js:658:5)
Apr 13 22:39:29 3bff1dd balenad[17710]: time="2020-04-13T22:39:29.221221672Z" level=debug msg="Running health check for container c114c62dc8c3f52dca0935771632403a196a7c301286d257d979a64c35b1ef71 ..."
Apr 13 22:39:29 3bff1dd balenad[17710]: time="2020-04-13T22:39:29.233273568Z" level=debug msg="starting exec command 40ad06a1e0faf2cb1faea71fe9eaaf9fb593d9dcf36e326afd8d129a98942134 in container c114c62dc8c3f52dca0935771632403a196a7c301286d257d979a64c35b1ef71"

Not sure why it is trying to update. Running diagnostics I do see that the container_engine has failed. I could try to apply the prescription outlined as resolution to issue 93148 to this, but perhaps you’d like to take a look since it was unclear in that case how the system got said state. I have granted support access.

We have seen this happen on low-powered devices like the pi zero. Are your services running again?

Started running again but not sure why or how.

Great to hear that is works now!