Device is stuck trying to reboot with "ailed to kill service"

I just installed a new device with the same image as two other devices. It worked for several hours until the db failed to write to a segment (weird already).

I tried to reboot remotely, and the OS seems stuck trying to kill the docker containers.

I am running on a Host OS Version “balenaOS 2.31.2+rev1” and Supervisor “9.11.1”.

I can successfully ssh to the host but doesn’t seem I can do much from there. Any ideas how to at least force a reboot from the Balena Cloud?

   sha256:c002dce31b4e29c62a71df04308835684d6ab2688951516c482efbaf712752dd'
03.04.19 13:50:01 (-0600) Service is already stopped, removing container 'uploader sha256:a0cfd45830e5d57c62b0e0bc651b2cdaf04988a597fd1db11692398cd6057ecf'
03.04.19 13:50:01 (-0600) Service is already stopped, removing container 'nginx sha256:d7759089796fa13ed339c4c744778c41d4d7ced3053a2927a649056fe1eb75ae'
03.04.19 13:50:01 (-0600) Service is already stopped, removing container 'web sha256:89dd3f4d7a1bf67080c859c3583f52c5d12e32c3478396db727139c446226ae0'
03.04.19 13:50:01 (-0600) Service is already stopped, removing container 'db1 sha256:223e6f3c728f860d5f8b74415811cde5efb26fc8e8c773ab024fbe6318a6162b'
03.04.19 13:50:01 (-0600) Service is already stopped, removing container 'redis sha256:e690306fc5b706fbf2d213e8be08050b815bb3ddf16bcac51898ee7195ef35e0'
03.04.19 13:50:01 (-0600) Service is already stopped, removing container 'red sha256:adf82ae558de8e1d56e5ae07d0310a49ea8efdced81c17f22fd567439a848416'
03.04.19 13:50:01 (-0600) Service is already stopped, removing container 'kapacitor sha256:b460e51acae702a470fb53e43e3034e2b7bc86cd297384e9d5ad360388d2fb10'
03.04.19 13:50:01 (-0600) Service is already stopped, removing container 'grafana sha256:c1c986c5220e640bb1cc94c7a92f56f4c2b444f67594cb1bc10b6c4384036b26'
03.04.19 13:50:01 (-0600) Service is already stopped, removing container 'agent1 sha256:cebc2700a35eccbae50574bf2bdc35227d254e0acd901fe992f8798c4dedb734'
03.04.19 13:50:01 (-0600) Service is already stopped, removing container 'chronograf sha256:b847bb93c34152e89519f95735c8c752d286bcb2cfbd7c5e57c6e7a3a5fb69f3'
03.04.19 13:50:01 (-0600) Service is already stopped, removing container 'ingress sha256:e9f192ec0cf6589e8e221a11c0be9b587cf954dc4850f2928b9005919f0c1297'
03.04.19 13:50:01 (-0600) Service is already stopped, removing container 'mqtt sha256:c002dce31b4e29c62a71df04308835684d6ab2688951516c482efbaf712752dd'
03.04.19 13:50:01 (-0600) Failed to kill service 'uploader sha256:a0cfd45830e5d57c62b0e0bc651b2cdaf04988a597fd1db11692398cd6057ecf' due to '(HTTP code 500) server error - driver "aufs" failed to remove root filesystem for dd27dfe1255070e64ddac9dae0848a7480fa451d64857f4bb83f576d52145cc3: error removing layers dir for 74cffc091648bf558a5b2c1444fa1bcf3533d8100d97de47918af4ef52fbf37a: remove /var/lib/docker/aufs/layers/74cffc091648bf558a5b2c1444fa1bcf3533d8100d97de47918af4ef52fbf37a: read-only file system '
03.04.19 13:50:01 (-0600) Failed to kill service 'nginx sha256:d7759089796fa13ed339c4c744778c41d4d7ced3053a2927a649056fe1eb75ae' due to '(HTTP code 500) server error - driver "aufs" failed to remove root filesystem for c676f8d64b22224f7e4b802cd0c770494aa3046b47c7049e6a539baf5e3b2a51: error removing layers dir for c603039f5e93bd3d345bf5270dfc634541a86478687da1c984bc104aa6d13ae5: remove /var/lib/docker/aufs/layers/c603039f5e93bd3d345bf5270dfc634541a86478687da1c984bc104aa6d13ae5: read-only file system '
03.04.19 13:50:01 (-0600) Failed to kill service 'web sha256:89dd3f4d7a1bf67080c859c3583f52c5d12e32c3478396db727139c446226ae0' due to '(HTTP code 500) server error - driver "aufs" failed to remove root filesystem for 8c046458574af331c12baa127bde6a91d347919dc9ce6884d4a39cf395972808: error removing layers dir for 63c1a5f368064e2264d069cc54f0ed117231641ded774bd826849357dc914afa: remove /var/lib/docker/aufs/layers/63c1a5f368064e2264d069cc54f0ed117231641ded774bd826849357dc914afa: read-only file system '
03.04.19 13:50:01 (-0600) Failed to kill service 'redis sha256:e690306fc5b706fbf2d213e8be08050b815bb3ddf16bcac51898ee7195ef35e0' due to '(HTTP code 500) server error - driver "aufs" failed to remove root filesystem for ce5b9a1135a3ab730351e9b4229bb21fb50c29827202ebf9653708a43e00ae17: error removing layers dir for 6873cf06108dd5916655455693a4694f1f77c6f4c0c5d1627ee5359f79f93012: remove /var/lib/docker/aufs/layers/6873cf06108dd5916655455693a4694f1f77c6f4c0c5d1627ee5359f79f93012: read-only file system '
03.04.19 13:50:01 (-0600) Failed to kill service 'db1 sha256:223e6f3c728f860d5f8b74415811cde5efb26fc8e8c773ab024fbe6318a6162b' due to '(HTTP code 500) server error - driver "aufs" failed to remove root filesystem for b4a6becb8836b6a9c7493dd387308406050f5a7cba991c83b8938cdf730bc39b: error removing layers dir for 39a2dbce39281d9ed4f2301bcea24b87e26b8c6a7da6b7ce5e3e40141848de07: remove /var/lib/docker/aufs/layers/39a2dbce39281d9ed4f2301bcea24b87e26b8c6a7da6b7ce5e3e40141848de07: read-only file system '
03.04.19 13:50:01 (-0600) Failed to kill service 'mqtt sha256:c002dce31b4e29c62a71df04308835684d6ab2688951516c482efbaf712752dd' due to '(HTTP code 500) server error - driver "aufs" failed to remove root filesystem for 6847da0139c598ba87b303679658a5edd6cbfb4858dc62230943897596486349: error removing layers dir for 35ebfec21cd297cec3b0523d2f38472a550196c763c751d0534671b0dc02de12: remove /var/lib/docker/aufs/layers/35ebfec21cd297cec3b0523d2f38472a550196c763c751d0534671b0dc02de12: read-only file system '
03.04.19 13:50:01 (-0600) Failed to kill service 'chronograf sha256:b847bb93c34152e89519f95735c8c752d286bcb2cfbd7c5e57c6e7a3a5fb69f3' due to '(HTTP code 500) server error - driver "aufs" failed to remove root filesystem for d197ae3a3b0ef99ca77853eb23c96dec901596a6dcda4b40db0a3b8b2b2be705: error removing layers dir for 7d6b6b2f621d81a99794a18c733c31383e7301038d1261f53f5c3fd3aa7cc3a2: remove /var/lib/docker/aufs/layers/7d6b6b2f621d81a99794a18c733c31383e7301038d1261f53f5c3fd3aa7cc3a2: read-only file system '

Hey @david-archsys this looks like SD card corruption to me, but if you provide the dashboard link and enable support access I may be able to confirm.

The device does not have an SD but does have Flash Memory:

https://dashboard.balena-cloud.com/devices/40937b98d02fb664da651f30fed59f1a/summary

I looked at dmesg on-device, and saw a lot of errors related to the mmc:

[28341.943856] mmc0: mmc_hs400_to_hs200 failed, error -110
[32181.614109] mmc0: mmc_hs400_to_hs200 failed, error -110
[35495.352143] mmc0: mmc_hs400_to_hs200 failed, error -110
[36214.216303] mmc0: mmc_hs400_to_hs200 failed, error -110
[36342.219243] mmc0: mmc_hs400_to_hs200 failed, error -110
[39006.214797] mmc0: mmc_hs400_to_hs200 failed, error -110
[39056.343837] mmc0: mmc_hs400_to_hs200 failed, error -110
[39215.251399] mmc0: mmc_hs400_to_hs200 failed, error -110
[39492.625682] mmc0: mmc_hs400_to_hs200 failed, error -110
[40734.101336] mmc0: mmc_hs400_to_hs200 failed, error -110
[42367.433895] mmc0: mmc_hs400_to_hs200 failed, error -110
[43000.416103] mmc0: mmc_hs400_to_hs200 failed, error -110
[44603.523983] mmc0: mmc_hs400_to_hs200 failed, error -110
[48476.451397] mmc0: mmc_hs400_to_hs200 failed, error -110
[49094.881587] mmc0: mmc_hs400_to_hs200 failed, error -110
[49412.848903] mmc0: mmc_hs400_to_hs200 failed, error -110
[49412.849144] mmc0: cache flush error -110
[49412.849154] print_req_error: I/O error, dev mmcblk0, sector 2484064
[49412.849246] Aborting journal on device mmcblk0p6-8.
[49412.850402] mmc0: mmc_hs400_to_hs200 failed, error -110
[49414.288064] EXT4-fs error (device mmcblk0p6): ext4_journal_check_start:61: Detected aborted journal
[49414.288070] EXT4-fs (mmcblk0p6): Remounting filesystem read-only
[49414.289096] EXT4-fs error (device mmcblk0p6): ext4_journal_check_start:61: Detected aborted journal

It does look like corruption, but some sources say that running fsck on the drive may fix this particular error (corrupted journal).

Ok, thanks for your help @CameronDiver. Will try to reimage whole disk