Application container does not start after code update

One of our deployed devices fails to start the container after receiving latest code update. We have a single container setup, running on rpi zero w boards.
In this application are multiple other devices (same hw) running normally.

Host OS: balenaOS 2.29.2+rev1
Supervisor version: 9.0.1

Service is reported to be running in balena dashboard, but opening the terminal displays the following:
Spawning shell…
Application container must be running for a terminal to be started.
SSH session disconnected

Host OS terminal works normally.

Restarting, rebooting or stopping the service from the web dashboard gives:
request error: tunneling socket could not be established, cause=socket hang up

I’ve also tried the following tasks:

  • push another code update (nothing happened)
  • reboot from terminal (device did reboot, but the problem remains)

balena version command gives the following response:
Client:
Version: 17.12.0-dev
API version: 1.35
Go version: go1.9.7
Git commit: dceb2fc48071b78a8a828e0468a15a479515385f
Built: Tue Jan 15 10:27:55 2019
OS/Arch: linux/arm
Experimental: false
Orchestrator: swarm
Cannot connect to the balenaEngine daemon at unix:///var/run/balena-engine.sock. Is the balenaEngine daemon running?

systemctl command: systemctl --state=failed
0 loaded units listed. Pass --all to see loaded but inactive units, too.
To show all installed unit files use ‘systemctl list-unit-files’.

Part of the response (it’s repeating periodically) from command journalctl --no-pager -u balena
Jun 27 14:25:35 91c516a balenad[25242]: Error starting daemon: layer does not exist
Jun 27 14:25:35 91c516a systemd[1]: balena.service: Main process exited, code=exited, status=1/FAILURE
Jun 27 14:25:35 91c516a systemd[1]: balena.service: Failed with result ‘exit-code’.
Jun 27 14:25:35 91c516a systemd[1]: Failed to start Balena Application Container Engine.
Jun 27 14:25:35 91c516a systemd[1]: balena.service: Service hold-off time over, scheduling restart.
Jun 27 14:25:35 91c516a systemd[1]: balena.service: Scheduled restart job, restart counter is at 957.
Jun 27 14:25:35 91c516a systemd[1]: Stopped Balena Application Container Engine.
Jun 27 14:25:35 91c516a systemd[1]: Starting Balena Application Container Engine…
Jun 27 14:25:36 91c516a balenad[25266]: time=“2019-06-27T14:25:36Z” level=warning msg=“Running experimental build”
Jun 27 14:25:37 91c516a balenad[25266]: time=“2019-06-27T14:25:37.045765016Z” level=info msg=“libcontainerd: started new balena-engine-containerd process” pid=25276
Jun 27 14:25:38 91c516a balenad[25266]: time=“2019-06-27T14:25:38Z” level=info msg=“starting containerd” module=containerd revision= version=1.0.0+unknown
Jun 27 14:25:38 91c516a balenad[25266]: time=“2019-06-27T14:25:38Z” level=info msg=“setting subreaper…” module=containerd
Jun 27 14:25:38 91c516a balenad[25266]: time=“2019-06-27T14:25:38Z” level=info msg=“changing OOM score to -500” module=containerd
Jun 27 14:25:38 91c516a balenad[25266]: time=“2019-06-27T14:25:38Z” level=info msg=“loading plugin “io.containerd.content.v1.content”…” module=containerd type=io.containerd.content.v1
Jun 27 14:25:38 91c516a balenad[25266]: time=“2019-06-27T14:25:38Z” level=info msg=“loading plugin “io.containerd.snapshotter.v1.overlayfs”…” module=containerd type=io.containerd.snapshotter.v1
Jun 27 14:25:38 91c516a balenad[25266]: time=“2019-06-27T14:25:38Z” level=info msg=“loading plugin “io.containerd.metadata.v1.bolt”…” module=containerd type=io.containerd.metadata.v1
Jun 27 14:25:38 91c516a balenad[25266]: time=“2019-06-27T14:25:38Z” level=info msg=“loading plugin “io.containerd.differ.v1.walking”…” module=containerd type=io.containerd.differ.v1
Jun 27 14:25:38 91c516a balenad[25266]: time=“2019-06-27T14:25:38Z” level=info msg=“loading plugin “io.containerd.gc.v1.scheduler”…” module=containerd type=io.containerd.gc.v1
Jun 27 14:25:38 91c516a balenad[25266]: time=“2019-06-27T14:25:38Z” level=info msg=“loading plugin “io.containerd.grpc.v1.containers”…” module=containerd type=io.containerd.grpc.v1
Jun 27 14:25:38 91c516a balenad[25266]: time=“2019-06-27T14:25:38Z” level=info msg=“loading plugin “io.containerd.grpc.v1.content”…” module=containerd type=io.containerd.grpc.v1
Jun 27 14:25:38 91c516a balenad[25266]: time=“2019-06-27T14:25:38Z” level=info msg=“loading plugin “io.containerd.grpc.v1.diff”…” module=containerd type=io.containerd.grpc.v1
Jun 27 14:25:38 91c516a balenad[25266]: time=“2019-06-27T14:25:38Z” level=info msg=“loading plugin “io.containerd.grpc.v1.events”…” module=containerd type=io.containerd.grpc.v1
Jun 27 14:25:38 91c516a balenad[25266]: time=“2019-06-27T14:25:38Z” level=info msg=“loading plugin “io.containerd.grpc.v1.healthcheck”…” module=containerd type=io.containerd.grpc.v1
Jun 27 14:25:38 91c516a balenad[25266]: time=“2019-06-27T14:25:38Z” level=info msg=“loading plugin “io.containerd.grpc.v1.images”…” module=containerd type=io.containerd.grpc.v1
Jun 27 14:25:38 91c516a balenad[25266]: time=“2019-06-27T14:25:38Z” level=info msg=“loading plugin “io.containerd.grpc.v1.leases”…” module=containerd type=io.containerd.grpc.v1
Jun 27 14:25:38 91c516a balenad[25266]: time=“2019-06-27T14:25:38Z” level=info msg=“loading plugin “io.containerd.grpc.v1.namespaces”…” module=containerd type=io.containerd.grpc.v1
Jun 27 14:25:38 91c516a balenad[25266]: time=“2019-06-27T14:25:38Z” level=info msg=“loading plugin “io.containerd.grpc.v1.snapshots”…” module=containerd type=io.containerd.grpc.v1
Jun 27 14:25:38 91c516a balenad[25266]: time=“2019-06-27T14:25:38Z” level=info msg=“loading plugin “io.containerd.monitor.v1.cgroups”…” module=containerd type=io.containerd.monitor.v1
Jun 27 14:25:39 91c516a balenad[25266]: time=“2019-06-27T14:25:39Z” level=info msg=“loading plugin “io.containerd.runtime.v1.linux”…” module=containerd type=io.containerd.runtime.v1
Jun 27 14:25:39 91c516a balenad[25266]: time=“2019-06-27T14:25:39Z” level=info msg=“loading plugin “io.containerd.grpc.v1.tasks”…” module=containerd type=io.containerd.grpc.v1
Jun 27 14:25:39 91c516a balenad[25266]: time=“2019-06-27T14:25:39Z” level=info msg=“loading plugin “io.containerd.grpc.v1.version”…” module=containerd type=io.containerd.grpc.v1
Jun 27 14:25:39 91c516a balenad[25266]: time=“2019-06-27T14:25:39Z” level=info msg=“loading plugin “io.containerd.grpc.v1.introspection”…” module=containerd type=io.containerd.grpc.v1
Jun 27 14:25:39 91c516a balenad[25266]: time=“2019-06-27T14:25:39Z” level=info msg=serving… address=/var/run/balena-engine/containerd/balena-engine-containerd-debug.sock module=containerd/debug
Jun 27 14:25:39 91c516a balenad[25266]: time=“2019-06-27T14:25:39Z” level=info msg=serving… address=/var/run/balena-engine/containerd/balena-engine-containerd.sock module=containerd/grpc
Jun 27 14:25:39 91c516a balenad[25266]: time=“2019-06-27T14:25:39Z” level=info msg=“containerd successfully booted in 0.357201s” module=containerd
Jun 27 14:25:39 91c516a balenad[25266]: time=“2019-06-27T14:25:39.540277957Z” level=warning msg=“failed to rename /var/lib/docker/tmp for background deletion: rename /var/lib/docker/tmp /var/lib/docker/tmp-old: file exists. Deleting synchronously”

Hey @vid,

Sorry to hear you are having issues! Can you share the device UUID with us and enable support access? Looks like something might have gone wrong with Balena Engine, and having access to the device would help us find out what’s going on!

Thank you for your quick reply, I have just send you PM.

Hi @vid welcome to the forums.

We’ve taken a look at your device and we think resolved the issue by clearing the contents of /var/lib/balena (not deleting the directory itself) although we have preserved your data volumes which are in there too.

The device is currently downloading your container again so please let us know if it all works OK when this completes!

Thank you @chrisys and @jviotti for your help, but our client turned the device off in the middle of the update.
I’ll check its status once it gets back online and write here if the problem remains.

Sure, let us know how it goes then!

I have a similar issue too. should i still try clearing out the content of /var/lib/balena too? Here is the UUID 211b96c5f360bdb4eff82dee37e0f297

I clear out that directory, except the volumes folder, and that seems to have worked!