SIGSEGV in balenad

#1

Every once in a while we see our Raspberry Pi with balenaOS restarting.

Today I was able to get the log from such a device and saw the error dumped below.

We are using balenaOS 2.29.2+rev1 (balena-engine 17.12.0-dev).

The containers are pretty busy outputting log messages, so I wouldn’t be surprised if it was related to this.

journalctl output
May 03 14:08:06 1c1e022 balenad[762]: fatal error: unexpected signal during runtime execution
May 03 14:08:06 1c1e022 balenad[762]: [signal SIGSEGV: segmentation violation code=0x2 addr=0x76ec8920 pc=0x76d59420]
May 03 14:08:06 1c1e022 balenad[762]: runtime stack:
May 03 14:08:06 1c1e022 balenad[762]: runtime.throw(0xf7b913, 0x2a)
May 03 14:08:06 1c1e022 balenad[762]:         /usr/lib/go/src/runtime/panic.go:605 +0x70
May 03 14:08:06 1c1e022 balenad[762]: runtime.sigpanic()
May 03 14:08:06 1c1e022 balenad[762]:         /usr/lib/go/src/runtime/signal_unix.go:351 +0x24c
May 03 14:08:06 1c1e022 balenad[762]: goroutine 642 [syscall, locked to thread]:
May 03 14:08:06 1c1e022 balenad[762]: runtime.cgocall(0xca7a80, 0x133e1760, 0xf78542)
May 03 14:08:06 1c1e022 balenad[762]:         /usr/lib/go/src/runtime/cgocall.go:132 +0xb8 fp=0x133e1740 sp=0x133e1724 pc=0x15298
May 03 14:08:06 1c1e022 balenad[762]: github.com/docker/docker/daemon/logger/journald._Cfunc_wait_for_data_cancelable(0x728011f8, 0xb6, 0x0)
May 03 14:08:06 1c1e022 balenad[762]:         github.com/docker/docker/daemon/logger/journald/_obj/_cgo_gotypes.go:393 +0x38 fp=0x133e175c sp=0x133e1740 pc=0xbccb68
May 03 14:08:06 1c1e022 balenad[762]: github.com/docker/docker/daemon/logger/journald.(*journald).followJournal.func1.1(0x728011f8, 0xb6, 0x728011f8)
May 03 14:08:06 1c1e022 balenad[762]:         /yocto/resin-board/build/tmp/work/cortexa7hf-neon-vfpv4-poky-linux-gnueabi/balena/17.12.0-dev+gitdceb2fc48071b78a8a828e0468a15a479515385f-r0/git/src/import/.gopath/src/githu>
May 03 14:08:06 1c1e022 balenad[762]: github.com/docker/docker/daemon/logger/journald.(*journald).followJournal.func1(0x728011f8, 0xb6, 0xb7, 0x13409a40, 0x12cb3bc8, 0x13151b40, 0x0, 0x0, 0x12ae1dc0)
May 03 14:08:06 1c1e022 balenad[762]:         /yocto/resin-board/build/tmp/work/cortexa7hf-neon-vfpv4-poky-linux-gnueabi/balena/17.12.0-dev+gitdceb2fc48071b78a8a828e0468a15a479515385f-r0/git/src/import/.gopath/src/githu>
May 03 14:08:06 1c1e022 balenad[762]: runtime.goexit()
May 03 14:08:06 1c1e022 balenad[762]:         /usr/lib/go/src/runtime/asm_arm.s:971 +0x4 fp=0x133e17c4 sp=0x133e17c4 pc=0x6dd14
May 03 14:08:06 1c1e022 balenad[762]: created by github.com/docker/docker/daemon/logger/journald.(*journald).followJournal
May 03 14:08:06 1c1e022 balenad[762]:         /yocto/resin-board/build/tmp/work/cortexa7hf-neon-vfpv4-poky-linux-gnueabi/balena/17.12.0-dev+gitdceb2fc48071b78a8a828e0468a15a479515385f-r0/git/src/import/.gopath/src/githu>
May 03 14:08:06 1c1e022 balenad[762]: goroutine 1 [chan receive, 43 minutes]:
May 03 14:08:06 1c1e022 balenad[762]: github.com/docker/docker/cmd/dockerd.(*DaemonCli).start(0x12c54c00, 0x12a64940, 0x0, 0x0)
May 03 14:08:06 1c1e022 balenad[762]:         /yocto/resin-board/build/tmp/work/cortexa7hf-neon-vfpv4-poky-linux-gnueabi/balena/17.12.0-dev+gitdceb2fc48071b78a8a828e0468a15a479515385f-r0/git/src/import/.gopath/src/githu>
May 03 14:08:06 1c1e022 balenad[762]: github.com/docker/docker/cmd/dockerd.runDaemon(0x12a64940, 0x12c78280, 0x0)
May 03 14:08:06 1c1e022 balenad[762]:         /yocto/resin-board/build/tmp/work/cortexa7hf-neon-vfpv4-poky-linux-gnueabi/balena/17.12.0-dev+gitdceb2fc48071b78a8a828e0468a15a479515385f-r0/git/src/import/.gopath/src/githu>
May 03 14:08:06 1c1e022 balenad[762]: github.com/docker/docker/cmd/dockerd.newDaemonCommand.func1(0x12c5c120, 0x12c585a0, 0x0, 0x11, 0x0, 0x0)
May 03 14:08:06 1c1e022 balenad[762]:         /yocto/resin-board/build/tmp/work/cortexa7hf-neon-vfpv4-poky-linux-gnueabi/balena/17.12.0-dev+gitdceb2fc48071b78a8a828e0468a15a479515385f-r0/git/src/import/.gopath/src/githu>
May 03 14:08:06 1c1e022 balenad[762]: github.com/docker/docker/vendor/github.com/spf13/cobra.(*Command).execute(0x12c5c120, 0x1297e008, 0x11, 0x11, 0x12c5c120, 0x1297e008)
May 03 14:08:06 1c1e022 balenad[762]:         /yocto/resin-board/build/tmp/work/cortexa7hf-neon-vfpv4-poky-linux-gnueabi/balena/17.12.0-dev+gitdceb2fc48071b78a8a828e0468a15a479515385f-r0/git/src/import/.gopath/src/githu>
May 03 14:08:06 1c1e022 balenad[762]: github.com/docker/docker/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0x12c5c120, 0x12c5c120, 0xfbd8ec, 0x1297c138)
May 03 14:08:06 1c1e022 balenad[762]:         /yocto/resin-board/build/tmp/work/cortexa7hf-neon-vfpv4-poky-linux-gnueabi/balena/17.12.0-dev+gitdceb2fc48071b78a8a828e0468a15a479515385f-r0/git/src/import/.gopath/src/githu>
May 03 14:08:06 1c1e022 balenad[762]: github.com/docker/docker/vendor/github.com/spf13/cobra.(*Command).Execute(0x12c5c120, 0x1296e0d0, 0xca13f0)
May 03 14:08:06 1c1e022 balenad[762]:         /yocto/resin-board/build/tmp/work/cortexa7hf-neon-vfpv4-poky-linux-gnueabi/balena/17.12.0-dev+gitdceb2fc48071b78a8a828e0468a15a479515385f-r0/git/src/import/.gopath/src/githu>
May 03 14:08:06 1c1e022 balenad[762]: github.com/docker/docker/cmd/dockerd.Main()
May 03 14:08:06 1c1e022 balenad[762]:         /yocto/resin-board/build/tmp/work/cortexa7hf-neon-vfpv4-poky-linux-gnueabi/balena/17.12.0-dev+gitdceb2fc48071b78a8a828e0468a15a479515385f-r0/git/src/import/.gopath/src/githu>
May 03 14:08:06 1c1e022 balenad[762]: main.main()
May 03 14:08:06 1c1e022 balenad[762]:         /yocto/resin-board/build/tmp/work/cortexa7hf-neon-vfpv4-poky-linux-gnueabi/balena/17.12.0-dev+gitdceb2fc48071b78a8a828e0468a15a479515385f-r0/git/src/import/.gopath/src/githu>
#6

Hi, thanks for the report, we are looking into it with the balenaEngine maintainer.

Does it happen to multiple devices? After the device is rebooted, does it function correctly?

Would be interesting if you could run on one of those devices, in the host OS md5sum -c --quiet /resinos.fingerprint It will check the root file system’s fingerprint. There likely be files that changed (that we know change), but balenad (or variants) shouldn’t be among those that fail this check.

If there’s a device that you can catch before reboot, that would be good if we could take a look. Not entirely sure that this error should result in a device reboot. I think it might result the balenaEngine healthcheck (or rather helathdog) restarting the engine, but not the whole device. The device restart should only happen by kernel healthceck by default, so there might be more things to unravel here.

We are also preparing the new balenaEngine to be released in the upcoming 2.33.0 OS version (not released yet, in progress), that pulls in a lot of fixes, maybe that will be a way forward to try, maybe the issue causing it was already fixed.

In the meantime, any other information is appreciated!

#8

Thanks for your response.

md5sum -c --quiet /resinos.fingerprint reported nothing; the contents of /resinos.fingerprint is the following:
https://pastebin.com/wCqNHwTy

Honestly, we’re not tracking those reboots and I only stubled upon the SIGSGV while investigating something else, so I though I would report it. I’ll keep my eyes open and of course as soon as 2.33 is out, I’ll update our devices to see if we still have the same issue.