Balena service dead

Hi,

Noticed that one of our devices was connected to VPN but the heartbeat was red. Logging into host and running balena-engine ps hangs.

Output of systemctl list-units --status=dead

   root@14d8bd6:~# systemctl list-units --state=dead
  UNIT                                               LOAD      ACTIVE   SUB  JOB   DESCRIPTION                                          
  dev-hugepages.mount                                loaded    inactive dead       Huge Pages File System                               
  resin\x2ddata.mount                                loaded    inactive dead start resin-data bind mount                                
  sys-fs-fuse-connections.mount                      loaded    inactive dead       FUSE Control File System                             
  var-lib-docker.mount                               loaded    inactive dead start Docker data mountpoint                               
  var-lib-machines.mount                             loaded    inactive dead       Virtual Machine and Container Storage                
  systemd-ask-password-console.path                  loaded    inactive dead       Dispatch Password Requests to Console Directory Watch
● auditd.service                                     not-found inactive dead       auditd.service                                       
  balena-host.service                                loaded    inactive dead       Balena Application Container Engine (host)           
  balena.service                                     loaded    inactive dead start Balena Application Container Engine                  
● bind-etc-systemd-system-resin.target.wants.service not-found inactive dead       bind-etc-systemd-system-resin.target.wants.service   
  config-json.service                                loaded    inactive dead       Config.json watcher service                          
● display-manager.service                            not-found inactive dead       display-manager.service                              
● dropbear.service                                   not-found inactive dead       dropbear.service                                     
  dropbearkey.service                                loaded    inactive dead       SSH Key Generation                                   
  emergency.service                                  loaded    inactive dead       Emergency Shell                                      
  initrd-cleanup.service                             loaded    inactive dead       Cleaning Up and Shutting Down Daemons                
  initrd-parse-etc.service                           loaded    inactive dead       Reload Configuration from the Real Root              
  initrd-switch-root.service                         loaded    inactive dead       Switch Root                                          
  initrd-udevadm-cleanup-db.service                  loaded    inactive dead       Cleanup udevd DB                                     
  ldconfig.service                                   loaded    inactive dead       Rebuild Dynamic Linker Cache                         
● ntpd.service                                       not-found inactive dead       ntpd.service                                         
● ntpdate.service                                    not-found inactive dead       ntpdate.service                                      
  os-config.service                                  loaded    inactive dead       OS configuration update service                      
  os-udevrules.service                               loaded    inactive dead       udev rules configuration from config.json            
  plymouth-read-write.service                        loaded    inactive dead       Tell Plymouth To Write Out Runtime Data              
  plymouth-switch-root.service                       loaded    inactive dead       Plymouth switch root service                         
  prepare-openvpn.service                            loaded    inactive dead       Prepare OpenVPN                                      
  redsocks.service                                   loaded    inactive dead       redsocks transparent proxy redirector                
  rescue.service                                     loaded    inactive dead       Rescue Shell                                         
  resin-state-reset.service                          loaded    inactive dead       Resin reset OS state                                 
  resin-supervisor.service                           loaded    inactive dead start Resin supervisor                                     
  rollback-altboot.service                           loaded    inactive dead       Balena rollback checks altboot                       
  rollback-health.service                            loaded    inactive dead start Balena rollback checks health                        
● sntp.service                                       not-found inactive dead       sntp.service                                         
  systemd-ask-password-console.service               loaded    inactive dead       Dispatch Password Requests to Console                
● systemd-ask-password-plymouth.service              masked    inactive dead       systemd-ask-password-plymouth.service                
  systemd-ask-password-wall.service                  loaded    inactive dead       Forward Password Requests to Wall                    
  systemd-firstboot.service                          loaded    inactive dead       First Boot Wizard                                    
  systemd-fsck-root.service                          loaded    inactive dead       File System Check on Root Device                     
  systemd-hwdb-update.service                        loaded    inactive dead       Rebuild Hardware Database                            
  systemd-initctl.service                            loaded    inactive dead       /dev/initctl Compatibility Daemon                    
  systemd-journal-catalog-update.service             loaded    inactive dead       Rebuild Journal Catalog                              
  systemd-machine-id-commit.service                  loaded    inactive dead       Commit a transient machine-id on disk                
  systemd-modules-load.service                       loaded    inactive dead       Load Kernel Modules                                  
  systemd-rfkill.service                             loaded    inactive dead       Load/Save RF Kill Switch Status                      
  systemd-sysusers.service                           loaded    inactive dead       Create System Users                                  
● systemd-timesyncd.service                          not-found inactive dead       systemd-timesyncd.service                            
  systemd-tmpfiles-clean.service                     loaded    inactive dead       Cleanup of Temporary Directories                     
  systemd-update-done.service                        loaded    inactive dead       Update is Completed                                  
  systemd-update-utmp-runlevel.service               loaded    inactive dead start Update UTMP about System Runlevel Changes            
● systemd-user-sessions.service                      not-found inactive dead       systemd-user-sessions.service                        
  systemd-vconsole-setup.service                     loaded    inactive dead       Setup Virtual Console                                
  timeinit-rtc.service                               loaded    inactive dead       Initialize system clock from RTC                     
  update-resin-supervisor.service                    loaded    inactive dead start Resin supervisor updater                             
  systemd-journald-audit.socket                      loaded    inactive dead       Journal Audit Socket                                 
● cryptsetup.target                                  not-found inactive dead       cryptsetup.target                                    
  emergency.target                                   loaded    inactive dead       Emergency Mode                                       
  graphical.target                                   loaded    inactive dead       Graphical Interface                                  
  initrd-fs.target                                   loaded    inactive dead       Initrd File Systems                                  
  initrd-root-device.target                          loaded    inactive dead       Initrd Root Device                                   
  initrd-root-fs.target                              loaded    inactive dead       Initrd Root File System                              
  initrd-switch-root.target                          loaded    inactive dead       Switch Root                                          
  initrd.target                                      loaded    inactive dead       Initrd Default Target                                
  multi-user.target                                  loaded    inactive dead start Multi-User System                                    
  network-pre.target                                 loaded    inactive dead       Network (Pre)                                        
  nss-user-lookup.target                             loaded    inactive dead       User and Group Name Lookups                          
  remote-fs-pre.target                               loaded    inactive dead       Remote File Systems (Pre)                            
  rescue.target                                      loaded    inactive dead       Rescue Mode                                          
  shutdown.target                                    loaded    inactive dead       Shutdown                                             
● syslog.target                                      not-found inactive dead       syslog.target                                        
  umount.target                                      loaded    inactive dead       Unmount All Filesystems       

So tried to restart balena using systemctl start balena but it hangs.

Looked at journalctl -f while I did that did not show any log information. When I try to launch a web terminal to any of our services it says spawning shell but hangs.

There is plenty of disk space and free memory:

root@14d8bd6:~# free -h
             total       used       free     shared    buffers     cached
Mem:        491452     133928     357524        988       5684      84932
-/+ buffers/cache:      43312     448140

So state seems unhealthy and figured this is a good opportunity to learn how to debug, asking for guidance.

For balena folks id 14d8bd6f53ca7086070aff5d6dd9d98b, support granted.

Thanks

Hey @0xff thanks for the question, the first thing to do always is to run the diagnostic checks and then I would usually pull the balenad logs using journalctl -u balena.service -t balenad --no-pager as well as dmesg

Right - the diagnostics are great and do indicate errors consistent with the errors seen and more:

Supervisor is NOT running (unmatched local and remote Supervisor versions)

Container engine balena is NOT running

Some networking issues detected: 
test_dockerhub: Could not query Docker Hub
test_balena_registry: Could not communicate with registry2.balena-cloud.com for authentication

Running the diagnostics also seems to have fixed the underlying issue whatever it was by rebooting the system.

output of journalctl -u balena.service -t balenad --no-pager:

root@14d8bd6:~# journalctl -u balena.service -t balenad --no-pager 
-- Logs begin at Sat 2020-04-25 18:37:56 UTC, end at Fri 2020-05-08 21:21:12 UTC. --
May 08 20:47:25 14d8bd6 balenad[858]: time="2020-05-08T20:47:25Z" level=warning msg="Running experimental build"
May 08 20:47:25 14d8bd6 balenad[858]: time="2020-05-08T20:47:25.755794273Z" level=info msg="libcontainerd: started new balena-engine-containerd process" pid=897
May 08 20:47:27 14d8bd6 balenad[858]: time="2020-05-08T20:47:27Z" level=info msg="starting containerd" module=containerd revision= version=1.0.0+unknown
May 08 20:47:27 14d8bd6 balenad[858]: time="2020-05-08T20:47:27Z" level=info msg="setting subreaper..." module=containerd
May 08 20:47:27 14d8bd6 balenad[858]: time="2020-05-08T20:47:27Z" level=info msg="changing OOM score to -500" module=containerd
May 08 20:47:27 14d8bd6 balenad[858]: time="2020-05-08T20:47:27Z" level=info msg="loading plugin \"io.containerd.content.v1.content\"..." module=containerd type=io.containerd.content.v1
May 08 20:47:27 14d8bd6 balenad[858]: time="2020-05-08T20:47:27Z" level=info msg="loading plugin \"io.containerd.snapshotter.v1.overlayfs\"..." module=containerd type=io.containerd.snapshotter.v1
May 08 20:47:27 14d8bd6 balenad[858]: time="2020-05-08T20:47:27Z" level=info msg="loading plugin \"io.containerd.metadata.v1.bolt\"..." module=containerd type=io.containerd.metadata.v1
May 08 20:47:27 14d8bd6 balenad[858]: time="2020-05-08T20:47:27Z" level=info msg="loading plugin \"io.containerd.differ.v1.walking\"..." module=containerd type=io.containerd.differ.v1
May 08 20:47:27 14d8bd6 balenad[858]: time="2020-05-08T20:47:27Z" level=info msg="loading plugin \"io.containerd.gc.v1.scheduler\"..." module=containerd type=io.containerd.gc.v1
May 08 20:47:27 14d8bd6 balenad[858]: time="2020-05-08T20:47:27Z" level=info msg="loading plugin \"io.containerd.grpc.v1.containers\"..." module=containerd type=io.containerd.grpc.v1
May 08 20:47:27 14d8bd6 balenad[858]: time="2020-05-08T20:47:27Z" level=info msg="loading plugin \"io.containerd.grpc.v1.content\"..." module=containerd type=io.containerd.grpc.v1
May 08 20:47:27 14d8bd6 balenad[858]: time="2020-05-08T20:47:27Z" level=info msg="loading plugin \"io.containerd.grpc.v1.diff\"..." module=containerd type=io.containerd.grpc.v1
May 08 20:47:27 14d8bd6 balenad[858]: time="2020-05-08T20:47:27Z" level=info msg="loading plugin \"io.containerd.grpc.v1.events\"..." module=containerd type=io.containerd.grpc.v1
May 08 20:47:27 14d8bd6 balenad[858]: time="2020-05-08T20:47:27Z" level=info msg="loading plugin \"io.containerd.grpc.v1.healthcheck\"..." module=containerd type=io.containerd.grpc.v1
May 08 20:47:27 14d8bd6 balenad[858]: time="2020-05-08T20:47:27Z" level=info msg="loading plugin \"io.containerd.grpc.v1.images\"..." module=containerd type=io.containerd.grpc.v1
May 08 20:47:27 14d8bd6 balenad[858]: time="2020-05-08T20:47:27Z" level=info msg="loading plugin \"io.containerd.grpc.v1.leases\"..." module=containerd type=io.containerd.grpc.v1
May 08 20:47:27 14d8bd6 balenad[858]: time="2020-05-08T20:47:27Z" level=info msg="loading plugin \"io.containerd.grpc.v1.namespaces\"..." module=containerd type=io.containerd.grpc.v1
May 08 20:47:27 14d8bd6 balenad[858]: time="2020-05-08T20:47:27Z" level=info msg="loading plugin \"io.containerd.grpc.v1.snapshots\"..." module=containerd type=io.containerd.grpc.v1
May 08 20:47:27 14d8bd6 balenad[858]: time="2020-05-08T20:47:27Z" level=info msg="loading plugin \"io.containerd.monitor.v1.cgroups\"..." module=containerd type=io.containerd.monitor.v1
May 08 20:47:27 14d8bd6 balenad[858]: time="2020-05-08T20:47:27Z" level=info msg="loading plugin \"io.containerd.runtime.v1.linux\"..." module=containerd type=io.containerd.runtime.v1
May 08 20:47:27 14d8bd6 balenad[858]: time="2020-05-08T20:47:27Z" level=info msg="loading plugin \"io.containerd.grpc.v1.tasks\"..." module=containerd type=io.containerd.grpc.v1
May 08 20:47:27 14d8bd6 balenad[858]: time="2020-05-08T20:47:27Z" level=info msg="loading plugin \"io.containerd.grpc.v1.version\"..." module=containerd type=io.containerd.grpc.v1
May 08 20:47:27 14d8bd6 balenad[858]: time="2020-05-08T20:47:27Z" level=info msg="loading plugin \"io.containerd.grpc.v1.introspection\"..." module=containerd type=io.containerd.grpc.v1
May 08 20:47:27 14d8bd6 balenad[858]: time="2020-05-08T20:47:27Z" level=info msg=serving... address=/var/run/balena-engine/containerd/balena-engine-containerd-debug.sock module=containerd/debug
May 08 20:47:27 14d8bd6 balenad[858]: time="2020-05-08T20:47:27Z" level=info msg=serving... address=/var/run/balena-engine/containerd/balena-engine-containerd.sock module=containerd/grpc
May 08 20:47:27 14d8bd6 balenad[858]: time="2020-05-08T20:47:27Z" level=info msg="containerd successfully booted in 0.257754s" module=containerd
May 08 20:47:29 14d8bd6 balenad[858]: time="2020-05-08T20:47:29.039115432Z" level=info msg="Graph migration to content-addressability took 0.00 seconds"
May 08 20:47:29 14d8bd6 balenad[858]: time="2020-05-08T20:47:29.064902409Z" level=warning msg="Your kernel does not support cgroup rt period"
May 08 20:47:29 14d8bd6 balenad[858]: time="2020-05-08T20:47:29.066525408Z" level=warning msg="Your kernel does not support cgroup rt runtime"
May 08 20:47:29 14d8bd6 balenad[858]: time="2020-05-08T20:47:29.068129407Z" level=warning msg="Unable to find cpuset cgroup in mounts"
May 08 20:47:29 14d8bd6 balenad[858]: time="2020-05-08T20:47:29.072594403Z" level=warning msg="mountpoint for pids not found"
May 08 20:47:29 14d8bd6 balenad[858]: time="2020-05-08T20:47:29.104383375Z" level=info msg="Loading containers: start."
May 08 20:47:30 14d8bd6 balenad[858]: time="2020-05-08T20:47:30.740542959Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/containers/delete type="*events.ContainerDelete"
May 08 20:47:31 14d8bd6 balenad[858]: time="2020-05-08T20:47:31.070755673Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/containers/delete type="*events.ContainerDelete"
May 08 20:47:31 14d8bd6 balenad[858]: time="2020-05-08T20:47:31.168049589Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/containers/delete type="*events.ContainerDelete"
May 08 20:47:31 14d8bd6 balenad[858]: time="2020-05-08T20:47:31.440017354Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/containers/delete type="*events.ContainerDelete"
May 08 21:03:31 14d8bd6 balenad[858]: time="2020-05-08T21:03:31.887557599Z" level=info msg="Removing stale sandbox 4c8c65161b15d49e3239121adbaa0472a0554393efd32063d8e3f8f8dadc1ecf (bc95564f21246dcf9402daa1c401f2d861d659b8d8d5773b44bbd69b31f732fb)"
May 08 21:03:31 14d8bd6 balenad[858]: time="2020-05-08T21:03:31.926366720Z" level=warning msg="Error (Unable to complete atomic operation, key modified) deleting object [endpoint b19dc00647e8e09b6ef01460af8d27d56cabb87dbbace6b6a69e5617925d2839 1493784db70e54fea9a6058c92f8371a43d103e9961de8c51a1be9de8a62cc4c], retrying...."
May 08 21:03:32 14d8bd6 balenad[858]: time="2020-05-08T21:03:32.867245613Z" level=info msg="Removing stale sandbox c149466b673b25663c7fadcb6b4f60bba2ebeec92b8b663b042503fe64d61912 (4db401c6ece3404162e35b968e68526ddcbf17e4108ac0cf248554d08da6aced)"
May 08 21:03:33 14d8bd6 balenad[858]: time="2020-05-08T21:03:33.137394380Z" level=warning msg="Error (Unable to complete atomic operation, key modified) deleting object [endpoint f538d0fbe9b4058ccc245c34ca2049c6c00f50e87f7f5f5ccd717041bfb357fe 54bd447d44ddeaec4d814ae4c68880f9d3a143ed5b0799ed31810408a570691e], retrying...."
May 08 21:03:33 14d8bd6 balenad[858]: time="2020-05-08T21:03:33.208621318Z" level=warning msg="Error (Unable to complete atomic operation, key modified) deleting object [endpoint 6cf26136eb75e4b20533f02a7d27aab02eeddc5892a23b03880e0579e1164dc0 68d2ea2b31d4b4dba255cc0b24dc2d2f3abb3e3257e6117495545ea0a2d57966], retrying...."
May 08 21:03:34 14d8bd6 balenad[858]: time="2020-05-08T21:03:34.152647503Z" level=info msg="Removing stale sandbox c36828a6805f6fd04ce4db51a6bae310ea59e7d9580d9740e62b03395d3f579d (708a520b5b04a2014adc4f8ddcaac77ea79e3a4a3362141715fa0fc1c8f4de02)"
May 08 21:03:34 14d8bd6 balenad[858]: time="2020-05-08T21:03:34.186606474Z" level=warning msg="Error (Unable to complete atomic operation, key modified) deleting object [endpoint b19dc00647e8e09b6ef01460af8d27d56cabb87dbbace6b6a69e5617925d2839 73ca4661b14fd39e39949a65099f83ad41b4da1886c9b5af9798957efee4f7d7], retrying...."
May 08 21:03:35 14d8bd6 balenad[858]: time="2020-05-08T21:03:35.170137624Z" level=info msg="Removing stale sandbox 1f5bf3e8f46cde2c885f774cad62033b6fb5bae0eb25c2eccd5508adb8536250 (66bd12d0be5444b153d35d9a05531da7839989561287600755942ee92d8f0790)"
May 08 21:03:35 14d8bd6 balenad[858]: time="2020-05-08T21:03:35.193651604Z" level=warning msg="Error (Unable to complete atomic operation, key modified) deleting object [endpoint b19dc00647e8e09b6ef01460af8d27d56cabb87dbbace6b6a69e5617925d2839 547131139dbb152c97d7a400b9c9f5357226dd5c19930480be049a0b757a2bc1], retrying...."
May 08 21:03:41 14d8bd6 balenad[858]: time="2020-05-08T21:03:41.000208590Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/containers/create type="*events.ContainerCreate"
May 08 21:03:41 14d8bd6 balenad[858]: time="2020-05-08T21:03:41.022758570Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/containers/create type="*events.ContainerCreate"
May 08 21:03:41 14d8bd6 balenad[858]: time="2020-05-08T21:03:41.024388569Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/containers/create type="*events.ContainerCreate"
May 08 21:03:41 14d8bd6 balenad[858]: time="2020-05-08T21:03:41Z" level=info msg="shim balena-engine-containerd-shim started" address=/containerd-shim/moby/bc95564f21246dcf9402daa1c401f2d861d659b8d8d5773b44bbd69b31f732fb/shim.sock debug=false module=containerd/tasks pid=1254
May 08 21:03:41 14d8bd6 balenad[858]: time="2020-05-08T21:03:41Z" level=info msg="shim balena-engine-containerd-shim started" address=/containerd-shim/moby/4db401c6ece3404162e35b968e68526ddcbf17e4108ac0cf248554d08da6aced/shim.sock debug=false module=containerd/tasks pid=1259
May 08 21:03:41 14d8bd6 balenad[858]: time="2020-05-08T21:03:41Z" level=info msg="shim balena-engine-containerd-shim started" address=/containerd-shim/moby/708a520b5b04a2014adc4f8ddcaac77ea79e3a4a3362141715fa0fc1c8f4de02/shim.sock debug=false module=containerd/tasks pid=1260
May 08 21:03:47 14d8bd6 balenad[858]: time="2020-05-08T21:03:47.399993063Z" level=warning msg="unknown container" container=bc95564f21246dcf9402daa1c401f2d861d659b8d8d5773b44bbd69b31f732fb module=libcontainerd namespace=plugins.moby
May 08 21:03:49 14d8bd6 balenad[858]: time="2020-05-08T21:03:49.972487842Z" level=warning msg="unknown container" container=bc95564f21246dcf9402daa1c401f2d861d659b8d8d5773b44bbd69b31f732fb module=libcontainerd namespace=plugins.moby
May 08 21:03:50 14d8bd6 balenad[858]: time="2020-05-08T21:03:50.065137762Z" level=warning msg="unknown container" container=708a520b5b04a2014adc4f8ddcaac77ea79e3a4a3362141715fa0fc1c8f4de02 module=libcontainerd namespace=plugins.moby
May 08 21:03:54 14d8bd6 balenad[858]: time="2020-05-08T21:03:54.248571149Z" level=warning msg="unknown container" container=708a520b5b04a2014adc4f8ddcaac77ea79e3a4a3362141715fa0fc1c8f4de02 module=libcontainerd namespace=plugins.moby
May 08 21:03:56 14d8bd6 balenad[858]: time="2020-05-08T21:03:56.071371575Z" level=warning msg="unknown container" container=4db401c6ece3404162e35b968e68526ddcbf17e4108ac0cf248554d08da6aced module=libcontainerd namespace=plugins.moby
May 08 21:03:57 14d8bd6 balenad[858]: time="2020-05-08T21:03:57.227285577Z" level=warning msg="unknown container" container=4db401c6ece3404162e35b968e68526ddcbf17e4108ac0cf248554d08da6aced module=libcontainerd namespace=plugins.moby
May 08 21:03:58 14d8bd6 balenad[858]: time="2020-05-08T21:03:58.540066443Z" level=info msg="Loading containers: done."
May 08 21:03:59 14d8bd6 balenad[858]: time="2020-05-08T21:03:59.725887419Z" level=info msg="Docker daemon" commit=fe78e2c9a69313007c53c83fff4b5525fbc2ba45 graphdriver(s)=aufs version=17.13.3-dev
May 08 21:03:59 14d8bd6 balenad[858]: time="2020-05-08T21:03:59.730914415Z" level=info msg="Daemon has completed initialization"
May 08 21:04:00 14d8bd6 balenad[858]: time="2020-05-08T21:04:00.525561729Z" level=info msg="API listen on /var/run/balena-engine.sock"
May 08 21:04:00 14d8bd6 balenad[858]: time="2020-05-08T21:04:00.664180609Z" level=info msg="API listen on /var/run/balena-engine.sock"
May 08 21:04:00 14d8bd6 balenad[858]: time="2020-05-08T21:04:00.680297595Z" level=info msg="API listen on /var/run/balena.sock"
May 08 21:04:18 14d8bd6 balenad[858]: time="2020-05-08T21:04:18.330740353Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/containers/create type="*events.ContainerCreate"
May 08 21:04:18 14d8bd6 balenad[858]: time="2020-05-08T21:04:18Z" level=info msg="shim balena-engine-containerd-shim started" address=/containerd-shim/moby/66bd12d0be5444b153d35d9a05531da7839989561287600755942ee92d8f0790/shim.sock debug=false module=containerd/tasks pid=1594
May 08 21:04:23 14d8bd6 balenad[858]: time="2020-05-08T21:04:23.815411617Z" level=warning msg="unknown container" container=66bd12d0be5444b153d35d9a05531da7839989561287600755942ee92d8f0790 module=libcontainerd namespace=plugins.moby
May 08 21:04:25 14d8bd6 balenad[858]: time="2020-05-08T21:04:25.956926767Z" level=warning msg="unknown container" container=66bd12d0be5444b153d35d9a05531da7839989561287600755942ee92d8f0790 module=libcontainerd namespace=plugins.moby
-- Reboot --
May 08 21:10:45 14d8bd6 balenad[2782]: time="2020-05-08T21:10:45Z" level=warning msg="Running experimental build"
May 08 21:10:45 14d8bd6 balenad[2782]: time="2020-05-08T21:10:45.908477765Z" level=info msg="libcontainerd: started new balena-engine-containerd process" pid=2792
May 08 21:10:46 14d8bd6 balenad[2782]: time="2020-05-08T21:10:46Z" level=info msg="starting containerd" module=containerd revision= version=1.0.0+unknown
May 08 21:10:46 14d8bd6 balenad[2782]: time="2020-05-08T21:10:46Z" level=info msg="setting subreaper..." module=containerd
May 08 21:10:46 14d8bd6 balenad[2782]: time="2020-05-08T21:10:46Z" level=info msg="changing OOM score to -500" module=containerd
May 08 21:10:46 14d8bd6 balenad[2782]: time="2020-05-08T21:10:46Z" level=info msg="loading plugin \"io.containerd.content.v1.content\"..." module=containerd type=io.containerd.content.v1
May 08 21:10:46 14d8bd6 balenad[2782]: time="2020-05-08T21:10:46Z" level=info msg="loading plugin \"io.containerd.snapshotter.v1.overlayfs\"..." module=containerd type=io.containerd.snapshotter.v1
May 08 21:10:46 14d8bd6 balenad[2782]: time="2020-05-08T21:10:46Z" level=info msg="loading plugin \"io.containerd.metadata.v1.bolt\"..." module=containerd type=io.containerd.metadata.v1
May 08 21:10:46 14d8bd6 balenad[2782]: time="2020-05-08T21:10:46Z" level=info msg="loading plugin \"io.containerd.differ.v1.walking\"..." module=containerd type=io.containerd.differ.v1
May 08 21:10:46 14d8bd6 balenad[2782]: time="2020-05-08T21:10:46Z" level=info msg="loading plugin \"io.containerd.gc.v1.scheduler\"..." module=containerd type=io.containerd.gc.v1
May 08 21:10:46 14d8bd6 balenad[2782]: time="2020-05-08T21:10:46Z" level=info msg="loading plugin \"io.containerd.grpc.v1.containers\"..." module=containerd type=io.containerd.grpc.v1
May 08 21:10:46 14d8bd6 balenad[2782]: time="2020-05-08T21:10:46Z" level=info msg="loading plugin \"io.containerd.grpc.v1.content\"..." module=containerd type=io.containerd.grpc.v1
May 08 21:10:46 14d8bd6 balenad[2782]: time="2020-05-08T21:10:46Z" level=info msg="loading plugin \"io.containerd.grpc.v1.diff\"..." module=containerd type=io.containerd.grpc.v1
May 08 21:10:46 14d8bd6 balenad[2782]: time="2020-05-08T21:10:46Z" level=info msg="loading plugin \"io.containerd.grpc.v1.events\"..." module=containerd type=io.containerd.grpc.v1
May 08 21:10:46 14d8bd6 balenad[2782]: time="2020-05-08T21:10:46Z" level=info msg="loading plugin \"io.containerd.grpc.v1.healthcheck\"..." module=containerd type=io.containerd.grpc.v1
May 08 21:10:46 14d8bd6 balenad[2782]: time="2020-05-08T21:10:46Z" level=info msg="loading plugin \"io.containerd.grpc.v1.images\"..." module=containerd type=io.containerd.grpc.v1
May 08 21:10:46 14d8bd6 balenad[2782]: time="2020-05-08T21:10:46Z" level=info msg="loading plugin \"io.containerd.grpc.v1.leases\"..." module=containerd type=io.containerd.grpc.v1
May 08 21:10:46 14d8bd6 balenad[2782]: time="2020-05-08T21:10:46Z" level=info msg="loading plugin \"io.containerd.grpc.v1.namespaces\"..." module=containerd type=io.containerd.grpc.v1
May 08 21:10:46 14d8bd6 balenad[2782]: time="2020-05-08T21:10:46Z" level=info msg="loading plugin \"io.containerd.grpc.v1.snapshots\"..." module=containerd type=io.containerd.grpc.v1
May 08 21:10:46 14d8bd6 balenad[2782]: time="2020-05-08T21:10:46Z" level=info msg="loading plugin \"io.containerd.monitor.v1.cgroups\"..." module=containerd type=io.containerd.monitor.v1
May 08 21:10:46 14d8bd6 balenad[2782]: time="2020-05-08T21:10:46Z" level=info msg="loading plugin \"io.containerd.runtime.v1.linux\"..." module=containerd type=io.containerd.runtime.v1
May 08 21:10:46 14d8bd6 balenad[2782]: time="2020-05-08T21:10:46Z" level=info msg="loading plugin \"io.containerd.grpc.v1.tasks\"..." module=containerd type=io.containerd.grpc.v1
May 08 21:10:46 14d8bd6 balenad[2782]: time="2020-05-08T21:10:46Z" level=info msg="loading plugin \"io.containerd.grpc.v1.version\"..." module=containerd type=io.containerd.grpc.v1
May 08 21:10:46 14d8bd6 balenad[2782]: time="2020-05-08T21:10:46Z" level=info msg="loading plugin \"io.containerd.grpc.v1.introspection\"..." module=containerd type=io.containerd.grpc.v1
May 08 21:10:46 14d8bd6 balenad[2782]: time="2020-05-08T21:10:46Z" level=info msg=serving... address=/var/run/balena-engine/containerd/balena-engine-containerd-debug.sock module=containerd/debug
May 08 21:10:46 14d8bd6 balenad[2782]: time="2020-05-08T21:10:46Z" level=info msg=serving... address=/var/run/balena-engine/containerd/balena-engine-containerd.sock module=containerd/grpc
May 08 21:10:46 14d8bd6 balenad[2782]: time="2020-05-08T21:10:46Z" level=info msg="containerd successfully booted in 0.164454s" module=containerd
May 08 21:10:47 14d8bd6 balenad[2782]: time="2020-05-08T21:10:47.242087340Z" level=info msg="Graph migration to content-addressability took 0.00 seconds"
May 08 21:10:47 14d8bd6 balenad[2782]: time="2020-05-08T21:10:47.249466238Z" level=warning msg="Your kernel does not support cgroup rt period"
May 08 21:10:47 14d8bd6 balenad[2782]: time="2020-05-08T21:10:47.250842219Z" level=warning msg="Your kernel does not support cgroup rt runtime"
May 08 21:10:47 14d8bd6 balenad[2782]: time="2020-05-08T21:10:47.252739193Z" level=warning msg="Unable to find cpuset cgroup in mounts"
May 08 21:10:47 14d8bd6 balenad[2782]: time="2020-05-08T21:10:47.255365157Z" level=warning msg="mountpoint for pids not found"
May 08 21:10:47 14d8bd6 balenad[2782]: time="2020-05-08T21:10:47.268991968Z" level=info msg="Loading containers: start."
May 08 21:10:54 14d8bd6 balenad[2782]: time="2020-05-08T21:10:54.548852389Z" level=info msg="Removing stale sandbox 1ae5b643fabcb2dbd04bb315d875fc47f8ea837017755e8f77943e53db9a7887 (708a520b5b04a2014adc4f8ddcaac77ea79e3a4a3362141715fa0fc1c8f4de02)"
May 08 21:10:54 14d8bd6 balenad[2782]: time="2020-05-08T21:10:54.581383940Z" level=warning msg="Error (Unable to complete atomic operation, key modified) deleting object [endpoint b19dc00647e8e09b6ef01460af8d27d56cabb87dbbace6b6a69e5617925d2839 74b94c740eae1d61a2995f2a2466413f62a3f2f4d4806f636b213d46ad2dba5c], retrying...."
May 08 21:10:55 14d8bd6 balenad[2782]: time="2020-05-08T21:10:55.531179817Z" level=info msg="Removing stale sandbox 33911e05f495848345a8a30f0a94a8e841a604ddf5a01b61c0e219460c06bb90 (4db401c6ece3404162e35b968e68526ddcbf17e4108ac0cf248554d08da6aced)"
May 08 21:10:55 14d8bd6 balenad[2782]: time="2020-05-08T21:10:55.823014785Z" level=warning msg="Error (Unable to complete atomic operation, key modified) deleting object [endpoint f538d0fbe9b4058ccc245c34ca2049c6c00f50e87f7f5f5ccd717041bfb357fe 72299fa5df3517fe2f8946ae9f6116182a6d9ed478e6b3176900cb525acd5334], retrying...."
May 08 21:10:55 14d8bd6 balenad[2782]: time="2020-05-08T21:10:55.891725836Z" level=warning msg="Error (Unable to complete atomic operation, key modified) deleting object [endpoint 6cf26136eb75e4b20533f02a7d27aab02eeddc5892a23b03880e0579e1164dc0 534580f55fe034f411bd1598b77851d39fa9e84fbced691b2e115670b22683b8], retrying...."
May 08 21:10:56 14d8bd6 balenad[2782]: time="2020-05-08T21:10:56.854318537Z" level=info msg="Removing stale sandbox 4105c63a4f63804e1dae337f16edfe66c85b550f9121a7de9503517702a0036b (66bd12d0be5444b153d35d9a05531da7839989561287600755942ee92d8f0790)"
May 08 21:10:56 14d8bd6 balenad[2782]: time="2020-05-08T21:10:56.884408121Z" level=warning msg="Error (Unable to complete atomic operation, key modified) deleting object [endpoint b19dc00647e8e09b6ef01460af8d27d56cabb87dbbace6b6a69e5617925d2839 9a43f56f98f43cd4bcb611d84db9b1129f1bc5f67a356869c917188239ff0c03], retrying...."
May 08 21:10:58 14d8bd6 balenad[2782]: time="2020-05-08T21:10:58.088502485Z" level=info msg="Removing stale sandbox 79c74997f802afb068ded080e4c665f56bf4b3910cd629fbbc99dc40211e96e7 (bc95564f21246dcf9402daa1c401f2d861d659b8d8d5773b44bbd69b31f732fb)"
May 08 21:10:58 14d8bd6 balenad[2782]: time="2020-05-08T21:10:58.111626166Z" level=warning msg="Error (Unable to complete atomic operation, key modified) deleting object [endpoint b19dc00647e8e09b6ef01460af8d27d56cabb87dbbace6b6a69e5617925d2839 5df599ce57f2897f1171fe72ed3d4390150fe1cf67d51f5d8e9bdc56d26a478a], retrying...."
May 08 21:11:03 14d8bd6 balenad[2782]: time="2020-05-08T21:11:03.116090024Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/containers/create type="*events.ContainerCreate"
May 08 21:11:03 14d8bd6 balenad[2782]: time="2020-05-08T21:11:03.122376937Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/containers/create type="*events.ContainerCreate"
May 08 21:11:03 14d8bd6 balenad[2782]: time="2020-05-08T21:11:03.432584651Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/containers/create type="*events.ContainerCreate"
May 08 21:11:03 14d8bd6 balenad[2782]: time="2020-05-08T21:11:03Z" level=info msg="shim balena-engine-containerd-shim started" address=/containerd-shim/moby/708a520b5b04a2014adc4f8ddcaac77ea79e3a4a3362141715fa0fc1c8f4de02/shim.sock debug=false module=containerd/tasks pid=3054
May 08 21:11:03 14d8bd6 balenad[2782]: time="2020-05-08T21:11:03Z" level=info msg="shim balena-engine-containerd-shim started" address=/containerd-shim/moby/bc95564f21246dcf9402daa1c401f2d861d659b8d8d5773b44bbd69b31f732fb/shim.sock debug=false module=containerd/tasks pid=3055
May 08 21:11:03 14d8bd6 balenad[2782]: time="2020-05-08T21:11:03Z" level=info msg="shim balena-engine-containerd-shim started" address=/containerd-shim/moby/4db401c6ece3404162e35b968e68526ddcbf17e4108ac0cf248554d08da6aced/shim.sock debug=false module=containerd/tasks pid=3063
May 08 21:11:10 14d8bd6 balenad[2782]: time="2020-05-08T21:11:10.180306424Z" level=warning msg="unknown container" container=708a520b5b04a2014adc4f8ddcaac77ea79e3a4a3362141715fa0fc1c8f4de02 module=libcontainerd namespace=plugins.moby
May 08 21:11:10 14d8bd6 balenad[2782]: time="2020-05-08T21:11:10.607205526Z" level=warning msg="unknown container" container=bc95564f21246dcf9402daa1c401f2d861d659b8d8d5773b44bbd69b31f732fb module=libcontainerd namespace=plugins.moby
May 08 21:11:14 14d8bd6 balenad[2782]: time="2020-05-08T21:11:14.413096944Z" level=warning msg="unknown container" container=708a520b5b04a2014adc4f8ddcaac77ea79e3a4a3362141715fa0fc1c8f4de02 module=libcontainerd namespace=plugins.moby
May 08 21:11:15 14d8bd6 balenad[2782]: time="2020-05-08T21:11:15.001250818Z" level=warning msg="unknown container" container=bc95564f21246dcf9402daa1c401f2d861d659b8d8d5773b44bbd69b31f732fb module=libcontainerd namespace=plugins.moby
May 08 21:11:16 14d8bd6 balenad[2782]: time="2020-05-08T21:11:16.738807811Z" level=warning msg="unknown container" container=4db401c6ece3404162e35b968e68526ddcbf17e4108ac0cf248554d08da6aced module=libcontainerd namespace=plugins.moby
May 08 21:11:19 14d8bd6 balenad[2782]: time="2020-05-08T21:11:19.011430413Z" level=warning msg="unknown container" container=4db401c6ece3404162e35b968e68526ddcbf17e4108ac0cf248554d08da6aced module=libcontainerd namespace=plugins.moby
May 08 21:11:20 14d8bd6 balenad[2782]: time="2020-05-08T21:11:20.181056253Z" level=info msg="Loading containers: done."
May 08 21:11:21 14d8bd6 balenad[2782]: time="2020-05-08T21:11:21.342763203Z" level=info msg="Docker daemon" commit=fe78e2c9a69313007c53c83fff4b5525fbc2ba45 graphdriver(s)=aufs version=17.13.3-dev
May 08 21:11:21 14d8bd6 balenad[2782]: time="2020-05-08T21:11:21.354774037Z" level=info msg="Daemon has completed initialization"
May 08 21:11:21 14d8bd6 balenad[2782]: time="2020-05-08T21:11:21.603836596Z" level=info msg="API listen on /var/run/balena-engine.sock"
May 08 21:11:21 14d8bd6 balenad[2782]: time="2020-05-08T21:11:21.625027303Z" level=info msg="API listen on /var/run/balena-engine.sock"
May 08 21:11:21 14d8bd6 balenad[2782]: time="2020-05-08T21:11:21.629704238Z" level=info msg="API listen on /var/run/balena.sock"
May 08 21:11:36 14d8bd6 balenad[2782]: time="2020-05-08T21:11:36.885077469Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/containers/create type="*events.ContainerCreate"
May 08 21:11:37 14d8bd6 balenad[2782]: time="2020-05-08T21:11:37Z" level=info msg="shim balena-engine-containerd-shim started" address=/containerd-shim/moby/66bd12d0be5444b153d35d9a05531da7839989561287600755942ee92d8f0790/shim.sock debug=false module=containerd/tasks pid=3344
May 08 21:11:42 14d8bd6 balenad[2782]: time="2020-05-08T21:11:42.138027894Z" level=warning msg="unknown container" container=66bd12d0be5444b153d35d9a05531da7839989561287600755942ee92d8f0790 module=libcontainerd namespace=plugins.moby
May 08 21:11:44 14d8bd6 balenad[2782]: time="2020-05-08T21:11:44.369482064Z" level=warning msg="unknown container" container=66bd12d0be5444b153d35d9a05531da7839989561287600755942ee92d8f0790 module=libcontainerd namespace=plugins.moby
May 08 21:16:46 14d8bd6 balenad[2782]: time="2020-05-08T21:16:46.988318038Z" level=warning msg="unknown container" container=66bd12d0be5444b153d35d9a05531da7839989561287600755942ee92d8f0790 module=libcontainerd namespace=plugins.moby
May 08 21:16:48 14d8bd6 balenad[2782]: time="2020-05-08T21:16:48.195029951Z" level=warning msg="unknown container" container=66bd12d0be5444b153d35d9a05531da7839989561287600755942ee92d8f0790 module=libcontainerd namespace=plugins.moby

errors in dmesg output after the reboot:

[  326.093938] systemd-udevd[24]: seq 1518 '/devices/platform/soc/20202000.mmc/mmc_host/mmc0/mmc0:e624/block/mmcblk0/mmcblk0p3' is taking a long time
[  446.094357] systemd-udevd[24]: seq 1518 '/devices/platform/soc/20202000.mmc/mmc_host/mmc0/mmc0:e624/block/mmcblk0/mmcblk0p3' killed
[  446.106100] systemd-udevd[24]: worker [32] terminated by signal 9 (Killed)
[  446.106199] systemd-udevd[24]: worker [32] failed while handling '/devices/platform/soc/20202000.mmc/mmc_host/mmc0/mmc0:e624/block/mmcblk0/mmcblk0p3'

Historically found that rebooting does fix issues but am curious to see how to get to the root of the issues and figure out how to avoid them because sometimes the devices go offline and then we have to send a technician to these remote sites in Columbia to power cycle them.

Hi there,

I am the maintainer of the diagnostics/checks interfaces, and I am glad to hear you are finding them useful! For what it’s worth, they have been designed to be as low-footprint as possible, so they definitely do not issue a reboot directly (though perhaps something else timed out and caused the reboot). Please let me know if you have any further feedback on those features!

As a note, all of those failing checks depend on the engine being up (we’ve documented these dependencies in most places, like here: https://www.balena.io/docs/reference/diagnostics/#test_dockerhub).

In terms of the errors you’ve noted, the stale sandbox messages have been noted here: https://github.com/balena-os/balena-engine/issues/153. I have not seen those systemd-udevd errors before, though if the SD card is that slow to respond I would suspect there are underlying hardware issues. I would recommend provisioning a different SD card with that same board to further narrow down the issue.