"Update successful, rebooting" persists through reboots, Balena Application Container Engine fails

I updated an existing device, going from 2.29.2+rev2.prod to 2.31.5+rev1.prod. The device appears to have rebooted itself initially after the update process, but persists in this Update successful, rebooting state. I’ve tried rebooting through the balenaCloud console, but get a socket error. I’ve been able to force a reboot via host OS cli, but this doesn’t resolve.

I followed the suggested checksum procedures as outlined in the following post and the only check which fails is config.txt

From what I’ve read, it might be a supervisor upgrade issue. I don’t see any way for me to manually update the supervisor, so I’m at a loss at the moment. I’ve tried stopping then starting services and daemons manually, with no change. I attempted the engine cleanup mentioned this next post which errored out with

Cannot connect to the balenaEngine daemon at unix:///var/run/balena-engine.sock. Is the balenaEngine daemon running?

Additional troubleshooting:

Results from systemctl list-unit-files | grep resin
 resin\x2ddata.mount                                disabled 
 bind-etc-resin-supervisor.service                  enabled  
 openvpn-resin.service                              enabled  
 resin-boot.service                                 enabled  
 resin-data.service                                 enabled  
 resin-filesystem-expand.service                    enabled  
 resin-hostname.service                             enabled  
 resin-info@.service                                disabled 
 resin-init.service                                 enabled  
 resin-net-config.service                           enabled  
 resin-persistent-logs.service                      enabled  
 resin-proxy-config.service                         enabled  
 resin-state-reset.service                          enabled  
 resin-state.service                                enabled  
 resin-supervisor.service                           enabled  
 update-resin-supervisor.service                    static   
 update-resin-supervisor.timer                      enabled 
Results from systemctl list-unit-files | grep balena
balena-device-uuid.service                         enabled  
balena-engine.service                              enabled  
balena-host.service                                static   
balena.service                                     enabled  
balena-engine.socket                               disabled 
balena-host.socket                                 enabled  
Results from journalctl --no-pager -u balena
 systemd[1]: balena.service: Main process exited, code=exited, status=1/FAILURE
 systemd[1]: balena.service: Failed with result 'exit-code'.
 systemd[1]: Failed to start Balena Application Container Engine.
 systemd[1]: balena.service: Service hold-off time over, scheduling restart.
 systemd[1]: balena.service: Scheduled restart job, restart counter is at 549.
 systemd[1]: Stopped Balena Application Container Engine.
 systemd[1]: Starting Balena Application Container Engine...
 balenad[20056]: time="2019-05-01T16:11:26Z" level=warning msg="Running experimental build"
 balenad[20056]: time="2019-05-01T16:11:26.166243081Z" level=info msg="libcontainerd: started new balena-engine-containerd process" pid=20070
 balenad[20056]: time="2019-05-01T16:11:26Z" level=info msg="starting containerd" module=containerd revision= version=1.0.0+unknown
 balenad[20056]: time="2019-05-01T16:11:26Z" level=info msg="setting subreaper..." module=containerd
 balenad[20056]: time="2019-05-01T16:11:26Z" level=info msg="changing OOM score to -500" module=containerd
 balenad[20056]: time="2019-05-01T16:11:26Z" level=info msg="loading plugin \"io.containerd.content.v1.content\"..." module=containerd type=io.containerd.content.v1
 balenad[20056]: time="2019-05-01T16:11:26Z" level=info msg="loading plugin \"io.containerd.snapshotter.v1.overlayfs\"..." module=containerd type=io.containerd.snapshotter.v1
 balenad[20056]: time="2019-05-01T16:11:26Z" level=info msg="loading plugin \"io.containerd.metadata.v1.bolt\"..." module=containerd type=io.containerd.metadata.v1
 balenad[20056]: time="2019-05-01T16:11:26Z" level=info msg="loading plugin \"io.containerd.differ.v1.walking\"..." module=containerd type=io.containerd.differ.v1
 balenad[20056]: time="2019-05-01T16:11:26Z" level=info msg="loading plugin \"io.containerd.gc.v1.scheduler\"..." module=containerd type=io.containerd.gc.v1
 balenad[20056]: time="2019-05-01T16:11:26Z" level=info msg="loading plugin \"io.containerd.grpc.v1.containers\"..." module=containerd type=io.containerd.grpc.v1
 balenad[20056]: time="2019-05-01T16:11:26Z" level=info msg="loading plugin \"io.containerd.grpc.v1.content\"..." module=containerd type=io.containerd.grpc.v1
 balenad[20056]: time="2019-05-01T16:11:26Z" level=info msg="loading plugin \"io.containerd.grpc.v1.diff\"..." module=containerd type=io.containerd.grpc.v1
 balenad[20056]: time="2019-05-01T16:11:26Z" level=info msg="loading plugin \"io.containerd.grpc.v1.events\"..." module=containerd type=io.containerd.grpc.v1
 balenad[20056]: time="2019-05-01T16:11:26Z" level=info msg="loading plugin \"io.containerd.grpc.v1.healthcheck\"..." module=containerd type=io.containerd.grpc.v1
 balenad[20056]: time="2019-05-01T16:11:26Z" level=info msg="loading plugin \"io.containerd.grpc.v1.images\"..." module=containerd type=io.containerd.grpc.v1
 balenad[20056]: time="2019-05-01T16:11:26Z" level=info msg="loading plugin \"io.containerd.grpc.v1.leases\"..." module=containerd type=io.containerd.grpc.v1
 balenad[20056]: time="2019-05-01T16:11:26Z" level=info msg="loading plugin \"io.containerd.grpc.v1.namespaces\"..." module=containerd type=io.containerd.grpc.v1
 balenad[20056]: time="2019-05-01T16:11:26Z" level=info msg="loading plugin \"io.containerd.grpc.v1.snapshots\"..." module=containerd type=io.containerd.grpc.v1
 balenad[20056]: time="2019-05-01T16:11:26Z" level=info msg="loading plugin \"io.containerd.monitor.v1.cgroups\"..." module=containerd type=io.containerd.monitor.v1
 balenad[20056]: time="2019-05-01T16:11:26Z" level=info msg="loading plugin \"io.containerd.runtime.v1.linux\"..." module=containerd type=io.containerd.runtime.v1
 balenad[20056]: time="2019-05-01T16:11:26Z" level=info msg="loading plugin \"io.containerd.grpc.v1.tasks\"..." module=containerd type=io.containerd.grpc.v1
 balenad[20056]: time="2019-05-01T16:11:26Z" level=info msg="loading plugin \"io.containerd.grpc.v1.version\"..." module=containerd type=io.containerd.grpc.v1
 balenad[20056]: time="2019-05-01T16:11:26Z" level=info msg="loading plugin \"io.containerd.grpc.v1.introspection\"..." module=containerd type=io.containerd.grpc.v1
 balenad[20056]: time="2019-05-01T16:11:26Z" level=info msg=serving... address=/var/run/balena-engine/containerd/balena-engine-containerd-debug.sock module=containerd/debug
 balenad[20056]: time="2019-05-01T16:11:26Z" level=info msg=serving... address=/var/run/balena-engine/containerd/balena-engine-containerd.sock module=containerd/grpc
 balenad[20056]: time="2019-05-01T16:11:26Z" level=info msg="containerd successfully booted in 0.009051s" module=containerd
 balenad[20056]: Error starting daemon: layer does not exist
 systemd[1]: balena.service: Main process exited, code=exited, status=1/FAILURE
 systemd[1]: balena.service: Failed with result 'exit-code'.
 systemd[1]: Failed to start Balena Application Container Engine.
 systemd[1]: balena.service: Service hold-off time over, scheduling restart.
 systemd[1]: balena.service: Scheduled restart job, restart counter is at 550.
 systemd[1]: Stopped Balena Application Container Engine.
 systemd[1]: Dependency failed for Balena Application Container Engine.
 systemd[1]: balena.service: Job balena.service/start failed with result 'dependency'
Results from journalctl -f -a -u resin-supervisor
-- Logs begin at Wed 2019-05-01 16:17:11 UTC. --
systemd[1]: resin-supervisor.service: Service hold-off time over, scheduling restart.
systemd[1]: resin-supervisor.service: Scheduled restart job, restart counter is at 185.
systemd[1]: Stopped Resin supervisor.
systemd[1]: Starting Resin supervisor...
resin-supervisor[32114]: Cannot connect to the balenaEngine daemon at unix:///var/run/balena-engine.sock. Is the balenaEngine daemon running?
resin-supervisor[32120]: activating
systemd[1]: resin-supervisor.service: Control process exited, code=exited status=3
systemd[1]: resin-supervisor.service: Failed with result 'exit-code'.
systemd[1]: Failed to start Resin supervisor.
systemd[1]: Stopped Resin supervisor.

I’m perfectly content with fixing it myself if I can get some direction, but support access is enabled if anyone is so inclined to take a look: 43e523d69b01e1f0868a3ef3c2df4b18

Hi there, I wasn’t able to access your device (maybe support access expired) to take a look but a colleague of mine suggested that this problem can sometimes be resolved by clearing /var/lib/balena

1 Like

That fixed it! Thanks much @chrisys

I didn’t realize that was where the persistent volumes were, so I have to rebuild my Unifi controller.

FTA: probably best to exclude /var/lib/docker/volumes from that wipe, or at least back the data in them up prior.`