We have recorded three crashes of balena lately, here is the journalctl output of the latest.
Balena version:
Client:
Version: unknown-version
API version: 1.35
Go version: go1.9.4
Git commit: unknown-commit
Built: unknown-buildtime
OS/Arch: linux/arm
Experimental: false
Orchestrator: swarm
Server:
Engine:
Version: 17.12.0-dev
API version: 1.35 (minimum version 1.12)
Go version: go1.9.4
Git commit: 2fe3ad1568c1b783a9201fc3082452ad79d7f396
Built: Wed Aug 8 13:31:24 2018
OS/Arch: linux/arm
Experimental: true
This appears to be the crash:
Oct 09 07:14:24 5f14761 healthdog[887]: time="2018-10-09T07:14:24.511615460Z" level=warning msg="unknown container" container=557844a07200e6499ba605d7d8c649e3ed60c7926774d49997bd06c15a6b01b5 module=libcontainerd namespace=plugins.moby
Oct 09 07:14:35 5f14761 healthdog[887]: time="2018-10-09T07:14:35.669668778Z" level=warning msg="unknown container" container=557844a07200e6499ba605d7d8c649e3ed60c7926774d49997bd06c15a6b01b5 module=libcontainerd namespace=plugins.moby
Oct 09 07:15:01 5f14761 healthdog[887]: time="2018-10-09T07:15:00.975296678Z" level=info msg="killing and restarting containerd" module=libcontainerd pid=911
Oct 09 07:15:02 5f14761 healthdog[887]: time="2018-10-09T07:15:02.258496627Z" level=error msg="containerd did not exit successfully" error="signal: killed" module=libcontainerd
Oct 09 07:15:02 5f14761 healthdog[887]: time="2018-10-09T07:15:02.906617033Z" level=error msg="failed to get event" error="rpc error: code = Internal desc = transport is closing" module=libcontainerd namespace=moby
Oct 09 07:15:02 5f14761 healthdog[887]: time="2018-10-09T07:15:02.875516960Z" level=error msg="failed to get event" error="rpc error: code = Internal desc = transport is closing" module=libcontainerd namespace=plugins.moby
Oct 09 07:15:03 5f14761 healthdog[887]: time="2018-10-09T07:15:03.418238603Z" level=error msg="failed restarting containerd" error="fork/exec /usr/bin/balena-containerd: cannot allocate memory" module=libcontainerd
Oct 09 07:15:03 5f14761 healthdog[887]: time="2018-10-09T07:15:03.635437375Z" level=error msg="failed restarting containerd" error="fork/exec /usr/bin/balena-containerd: cannot allocate memory" module=libcontainerd
Oct 09 07:15:03 5f14761 healthdog[887]: time="2018-10-09T07:15:03.715753524Z" level=error msg="failed restarting containerd" error="fork/exec /usr/bin/balena-containerd: cannot allocate memory" module=libcontainerd
Oct 09 07:15:04 5f14761 healthdog[887]: time="2018-10-09T07:15:04.184153060Z" level=error msg="failed restarting containerd" error="fork/exec /usr/bin/balena-containerd: cannot allocate memory" module=libcontainerd
Oct 09 07:15:05 5f14761 healthdog[887]: time="2018-10-09T07:15:05.040319989Z" level=error msg="failed restarting containerd" error="fork/exec /usr/bin/balena-containerd: cannot allocate memory" module=libcontainerd
Oct 09 07:15:05 5f14761 healthdog[887]: time="2018-10-09T07:15:05.205719180Z" level=error msg="failed restarting containerd" error="fork/exec /usr/bin/balena-containerd: cannot allocate memory" module=libcontainerd
Oct 09 07:15:06 5f14761 healthdog[887]: time="2018-10-09T07:15:06.448325394Z" level=error msg="failed restarting containerd" error="fork/exec /usr/bin/balena-containerd: cannot allocate memory" module=libcontainerd
Oct 09 07:15:07 5f14761 healthdog[887]: time="2018-10-09T07:15:06.788386607Z" level=error msg="failed restarting containerd" error="fork/exec /usr/bin/balena-containerd: cannot allocate memory" module=libcontainerd
Oct 09 07:15:07 5f14761 healthdog[887]: time="2018-10-09T07:15:06.975459667Z" level=error msg="failed restarting containerd" error="fork/exec /usr/bin/balena-containerd: cannot allocate memory" module=libcontainerd
Oct 09 07:15:08 5f14761 healthdog[887]: time="2018-10-09T07:15:08.147785303Z" level=error msg="failed restarting containerd" error="fork/exec /usr/bin/balena-containerd: cannot allocate memory" module=libcontainerd
Oct 09 07:15:09 5f14761 healthdog[887]: time="2018-10-09T07:15:09.711485782Z" level=error msg="failed restarting containerd" error="fork/exec /usr/bin/balena-containerd: cannot allocate memory" module=libcontainerd
Oct 09 07:15:10 5f14761 healthdog[887]: time="2018-10-09T07:15:10.466691012Z" level=error msg="failed restarting containerd" error="fork/exec /usr/bin/balena-containerd: cannot allocate memory" module=libcontainerd
Oct 09 07:15:10 5f14761 healthdog[887]: time="2018-10-09T07:15:10.536724177Z" level=error msg="failed restarting containerd" error="fork/exec /usr/bin/balena-containerd: cannot allocate memory" module=libcontainerd
Oct 09 07:15:11 5f14761 healthdog[887]: time="2018-10-09T07:15:10.738530827Z" level=error msg="failed restarting containerd" error="fork/exec /usr/bin/balena-containerd: cannot allocate memory" module=libcontainerd
Oct 09 07:15:12 5f14761 healthdog[887]: time="2018-10-09T07:15:12.345340048Z" level=error msg="failed restarting containerd" error="fork/exec /usr/bin/balena-containerd: cannot allocate memory" module=libcontainerd
Oct 09 07:15:13 5f14761 healthdog[887]: time="2018-10-09T07:15:13.546465747Z" level=error msg="failed restarting containerd" error="fork/exec /usr/bin/balena-containerd: cannot allocate memory" module=libcontainerd
Oct 09 07:15:14 5f14761 healthdog[887]: time="2018-10-09T07:15:14.664733541Z" level=error msg="failed restarting containerd" error="fork/exec /usr/bin/balena-containerd: cannot allocate memory" module=libcontainerd
Oct 09 07:15:17 5f14761 healthdog[887]: time="2018-10-09T07:15:16.765043746Z" level=error msg="failed restarting containerd" error="fork/exec /usr/bin/balena-containerd: cannot allocate memory" module=libcontainerd
Oct 09 07:15:19 5f14761 healthdog[887]: SIGABRT: abort
Oct 09 07:15:19 5f14761 healthdog[887]: PC=0x6e9d0 m=0 sigcode=0
Oct 09 07:15:19 5f14761 healthdog[887]: goroutine 0 [idle]:
Oct 09 07:15:17 5f14761 systemd[1]: balena.service: Watchdog timeout (limit 1min)!
Oct 09 07:15:18 5f14761 systemd[1]: balena.service: Killing process 887 (balenad) with signal SIGABRT.
Oct 09 07:15:19 5f14761 systemd[1]: balena.service: Killing process 889 (exe) with signal SIGABRT.
Oct 09 07:15:19 5f14761 systemd[1]: balena.service: Killing process 3468 (balena-containe) with signal SIGABRT.
Oct 09 07:15:19 5f14761 systemd[1]: balena.service: Killing process 4795 (balena-containe) with signal SIGABRT.
Oct 09 07:15:19 5f14761 systemd[1]: balena.service: Killing process 14639 (balena-healthch) with signal SIGABRT.
Oct 09 07:15:19 5f14761 systemd[1]: balena.service: Killing process 14640 (balena) with signal SIGABRT.
After it crashes there are many log entries, heres a sample: https://paste.ee/p/ZuAIx
Hardware is a 512MB SBC which normally has around 90 - 120MB free memory. And a 64MB swap file.
Why did it even try to kill and restart containerd?