Watchdog restarts the device during a release update

My raspberry pi is stuck in a restart loop, unable to update to the newest release.
Inside the balenaCloud dashboard I can see the update reaching about 7% progress before a reboot happens.
The release which I have on the device right now doesn’t have a correct docker CMD (my mistake), so the supervisor watchdog fails, but I’m unable to update because of the reboots (which happens roughly every 8 minutes).

You could add an option in the dashboard to disable the watchdog reboots – or to not reboot the device while an update is in progress…

Running “journalctl -a” on the device gives the following (truncated):

Jan 14 21:04:06 6f58c4d fcf800c6d1da[830]: [info] Internet Connectivity: OK
Jan 14 21:04:06 6f58c4d resin-supervisor[1360]: [info] Internet Connectivity: OK
Jan 14 21:08:34 6f58c4d fcf800c6d1da[830]: [api] GET /v1/healthy 200 - 32.079 ms
Jan 14 21:08:34 6f58c4d resin-supervisor[1360]: [api] GET /v1/healthy 200 - 32.079 ms
Jan 14 21:09:34 6f58c4d systemd[1]: balena.service: Watchdog timeout (limit 6min)!
Jan 14 21:09:34 6f58c4d systemd[1]: balena.service: Killing process 830 (balenad) with signal SIGABRT.
Jan 14 21:09:34 6f58c4d systemd[1]: balena.service: Killing process 837 (exe) with signal SIGABRT.
Jan 14 21:09:34 6f58c4d systemd[1]: balena.service: Killing process 886 (balena-engine-c) with signal SIGABRT.
Jan 14 21:09:34 6f58c4d systemd[1]: balena.service: Killing process 1532 (balena-engine-c) with signal SIGABRT.
Jan 14 21:09:34 6f58c4d systemd[1]: balena.service: Killing process 1641 (exe) with signal SIGABRT.
Jan 14 21:09:34 6f58c4d balenad[830]: SIGABRT: abort
Jan 14 21:09:34 6f58c4d balenad[830]: PC=0x45a544 m=0 sigcode=0
Jan 14 21:09:34 6f58c4d balenad[830]: goroutine 0 [idle]:
Jan 14 21:09:34 6f58c4d balenad[830]: runtime.futex(0x23fc528, 0x0, 0x0, 0x0, 0x4400000000, 0x4387dc, 0x442005c948, 0x0, 0x0, 0x4338d8, …)
Jan 14 21:09:34 6f58c4d balenad[830]: /usr/lib/go/src/runtime/sys_linux_arm64.s:321 +0x1c
Jan 14 21:09:34 6f58c4d balenad[830]: runtime.futexsleep(0x23fc528, 0x0, 0xffffffffffffffff)
Jan 14 21:09:34 6f58c4d systemd[1]: balena.service: Killing process 1673 (balena-healthch) with signal SIGABRT.
Jan 14 21:09:34 6f58c4d balenad[830]: /usr/lib/go/src/runtime/os_linux.go:45 +0x3c
Jan 14 21:09:34 6f58c4d balenad[830]: runtime.notesleep(0x23fc528)
Jan 14 21:09:34 6f58c4d balenad[830]: /usr/lib/go/src/runtime/lock_futex.go:151 +0x84
Jan 14 21:09:34 6f58c4d systemd[1]: balena.service: Killing process 1701 (balena) with signal SIGABRT.
Jan 14 21:09:34 6f58c4d balenad[830]: runtime.stoplockedm(SIGABRT: abort)
Jan 14 21:09:34 6f58c4d balenad[830]: PC= 0x45a544/usr/lib/go/src/runtime/proc.go m=:02101 sigcode= +00x60
Jan 14 21:09:34 6f58c4d balenad[830]: goroutine runtime.schedule0( [)
Jan 14 21:09:34 6f58c4d balenad[830]: idle ]:
Jan 14 21:09:34 6f58c4d balenad[830]: /usr/lib/go/src/runtime/proc.go:2493runtime.futex +0x274(0x23fc528
Jan 14 21:09:34 6f58c4d balenad[830]: , runtime.park_m0x0(, 0x4420093c800x0)
Jan 14 21:09:34 6f58c4d balenad[830]: , 0x0/usr/lib/go/src/runtime/proc.go, :0x44000000002604, +0x2000000000x90,
Jan 14 21:09:34 6f58c4d balenad[830]: 0x1, runtime.mcall0x0(0x0)
Jan 14 21:09:34 6f58c4d balenad[830]: , 0x0 , /usr/lib/go/src/runtime/asm_arm64.s0x4331c8:, …169)
Jan 14 21:09:34 6f58c4d balenad[830]: + 0x50/usr/lib/go/src/runtime/sys_linux_arm64.s
Jan 14 21:09:34 6f58c4d balenad[830]: :321
Jan 14 21:09:34 6f58c4d balenad[830]: +0x1cgoroutine
Jan 14 21:09:34 6f58c4d balenad[830]: 1 [runtime.futexsleepchan receive(, 0x23fc5288 minutes, ]:
Jan 14 21:09:34 6f58c4d balenad[830]: 0x0, 0xffffffffffffffff)
Jan 14 21:09:34 6f58c4d balenad[830]: /usr/lib/go/src/runtime/os_linux.go:45 +0x3c
Jan 14 21:09:34 6f58c4d balenad[830]: runtime.notesleep(0x23fc528)
Jan 14 21:09:34 6f58c4d balenad[830]: /usr/lib/go/src/runtime/lock_futex.go:151 +0x84

And also (more recent log):

Jan 14 22:23:40 6f58c4d resin-supervisor[9815]: time=“2020-01-14T22:23:40.324761888Z” level=error msg=“error waiting for container: unexpected EOF”
Jan 14 22:23:40 6f58c4d systemd[1]: balena.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Jan 14 22:23:40 6f58c4d systemd[1]: balena.service: Failed with result ‘watchdog’.
Jan 14 22:23:40 6f58c4d systemd[1]: resin-supervisor.service: Main process exited, code=exited, status=125/n/a
Jan 14 22:23:40 6f58c4d systemd[1]: resin-supervisor.service: Failed with result ‘exit-code’.

Terminal from the balenaCloud dashboard:

14.01.20 22:55:53 (+0200) Supervisor starting
14.01.20 22:56:02 (+0200) Downloading image ‘registry2.balena-cloud.com/v2/fc244d3e7c9552ed3184d0b87abcda2f@sha256:3c6977f8f0f4e7967947cdc7bf31052fe4b4525c30e61eed8e272c8c0b204021
14.01.20 22:58:12 (+0200) Rebooting
14.01.20 23:03:50 (+0200) Supervisor starting
14.01.20 23:04:03 (+0200) Downloading image ‘registry2.balena-cloud.com/v2/fc244d3e7c9552ed3184d0b87abcda2f@sha256:3c6977f8f0f4e7967947cdc7bf31052fe4b4525c30e61eed8e272c8c0b204021
14.01.20 23:11:12 (+0200) Supervisor starting
14.01.20 23:11:31 (+0200) Downloading image ‘registry2.balena-cloud.com/v2/fc244d3e7c9552ed3184d0b87abcda2f@sha256:3c6977f8f0f4e7967947cdc7bf31052fe4b4525c30e61eed8e272c8c0b204021
14.01.20 23:19:01 (+0200) Supervisor starting
14.01.20 23:19:18 (+0200) Downloading image ‘registry2.balena-cloud.com/v2/fc244d3e7c9552ed3184d0b87abcda2f@sha256:3c6977f8f0f4e7967947cdc7bf31052fe4b4525c30e61eed8e272c8c0b204021

Hi @claudiu725,

can you provide some extra info please? I would like to know what is your device’s type(RaspberryPi 3, 4 or other). Which balenaOS version are you running and which Supervisor version are you running (you can find all of these in your balenaCloud dashboard).

Best Regards,
Marios

Hi @mbalamat

Type
Raspberry Pi 3 (using 64bit OS) (BETA)

Host OS Version
balenaOS 2.46.1+rev1

Supervisor Version
10.6.27

What you can do is SSH into the host, find the offending container and its image, forcefully kill the container and remove the image and let the Supervisor pull from scratch. You may have to fight a bit with the Supervisor trying to do its thing so try to be quick. Alternatively, a less offensive strategy which might work is to use the “kill-then-download” update strategy.

Thanks!