SD Corrupted - Can Access Terminal - Can anything be done?

Hi,

TLDR; I think my SD card has corrupted files in /var/lib/docker/aufsdiff/*, I can’t physically access the device for about a month. Any tips for recovery?

I have pushed multiple containers, all was going swimmingly well but they end up as dead services.

Diagnostics shows that the container engine and supervisor are not running.

Using balenaOS 2.83.21+rev1.prod, 12.10.3 supervisor verion, Raspberry Pi Zero W v1.2

Power wise, no expected issues, or interruptions that I know of (Samsung charger, can’t remember the rating, but no warnings from the diagnostic tool).

SD card is SanDisk Ultra 64GB and new.

I CAN terminal into the Host OS from the balena cloud dashboard absolutely fine, again, any calls to balena ps etc hang.

As the supervisor isn’t running, I can ‘tunneling socket could not be established, statusCode=500’ errors as would be expected.

Logs show many messages when trying to kill the services: eg:

Failed to kill service 'tailscale sha256:d8f1ae2598c186c0009d1bf07ff8ffb8c920ce76d7bfb67454bd520f1ea77372' due to '(HTTP code 500) server error - container e94bf56dac0f005192fd1515e39ff05100647a02b9b802871d6fb1dac42e8d3c: driver "aufs" failed to remove root filesystem: could not remove diff path for id c695ddf7d28d33927d7a765fd663adc8e9f1247205e8edbf412a99834b2bd95f: readdirnames /var/lib/docker/aufs/diff/c695ddf7d28d33927d7a765fd663adc8e9f1247205e8edbf412a99834b2bd95f-removing/tmp: readdirent: bad message '

LS the problem files shows: ls -l /var/lib/docker/aufs/diff | head -4:

ls: cannot access '/var/lib/docker/aufs/diff/56ce83d2044164d914d33bde896233d47403b342fa25f1f8c4f2902e31e542ca': Structure needs cleaning
ls: cannot access '/var/lib/docker/aufs/diff/c4f80f936e77e330c994577a0416a4398843943682e7f9696a52c15bce26b074': Structure needs cleaning
ls: cannot access '/var/lib/docker/aufs/diff/f67668d68f51d46c1f4c8fd0c47d84b295fea41f135f21e03ab7d307f9bf6226': Structure needs cleaning
ls: cannot access '/var/lib/docker/aufs/diff/1489bc83298e4b905506df5389f55355445a51801f1a0c707d64b063859d75de': Bad message
ls: cannot access '/var/lib/docker/aufs/diff/d1c69150330cba35e6f903fce70a3fb06f49ad01f104e1205243e1af85c623ee': Structure needs cleaning
ls: cannot access '/var/lib/docker/aufs/diff/77257753d168fbcef9d0b1f976bd89e190eb1e97b52c5b237640a02d41784f2d': Structure needs cleaning
ls: cannot access '/var/lib/docker/aufs/diff/19ed2e8fd34d83ba8ce3b09c470a498472d6ab6255b5558f74f596bca0956958': Structure needs cleaning
ls: cannot access '/var/lib/docker/aufs/diff/b32e3f2244da23577d203c5556969ef1762dfd1b1da6e93337b11d9e0520596a': Bad message
ls: cannot access '/var/lib/docker/aufs/diff/c137619238106860180d0fdd5bd57957760fba8fd6c832e59ce61b96126e2925': Structure needs cleaning
ls: cannot access '/var/lib/docker/aufs/diff/1cd76bf3d3fc36936f72fddc3e85529dcae48d429344833625b033a75469a623': Bad message
ls: cannot access '/var/lib/docker/aufs/diff/a804e4f9ccb971b7f9a2b2e9e742d08b73e0d5aef3fe83870b471303ee8d164a': Structure needs cleaning
ls: cannot access '/var/lib/docker/aufs/diff/d749f16457fef9ec6910df1c90801277cf34d02db437e6762ebca48ed63be6ac-init': Bad message
ls: cannot access '/var/lib/docker/aufs/diff/35e47667665c3ecc804af7c3f16115bc81916ee90bccad1434efc658a9611aae': Structure needs cleaning
ls: cannot access '/var/lib/docker/aufs/diff/796c14322a5bc73f0d62ae4c1cdb5d6806512c0c2040333c5437064a1f024fd0': Bad message
ls: cannot access '/var/lib/docker/aufs/diff/51af42086e8c4c13139e6c18e3fc353ee9b4153818aa426169b72686c2157aa2': Structure needs cleaning
ls: cannot access '/var/lib/docker/aufs/diff/e165d279874c18e621a3ed8dc459baca1b3d995c4d29f6f1003d07a6793b458f': Structure needs cleaning
ls: cannot access '/var/lib/docker/aufs/diff/23a2cdcc1c498114bf8ed8eba56ae9791001d3e8571666a00f3fc24443d2b028': Bad message
ls: cannot access '/var/lib/docker/aufs/diff/6632be58824c4b297411e44794d62ca37ee638f70c1924a8b5a427c3fa009e2f': Structure needs cleaning
ls: cannot access '/var/lib/docker/aufs/diff/f67b0a342bfbbc95d489a5fe7bc5ddf988c949cf76d7720e5111f5915c5ec65c': Bad message
ls: cannot access '/var/lib/docker/aufs/diff/19371eb5702d2e81120af31d23ae58e947e22a616ee28f555ebc54b531c22728': Structure needs cleaning
ls: cannot access '/var/lib/docker/aufs/diff/9c8b21fbc7417fd12116bedd91eada331cf9adf90c7c678f406306e6df53f0bb': Bad message
ls: cannot access '/var/lib/docker/aufs/diff/9990d6d81264871fabe3b007932177e4fd1126e7eea24c3752ca162869b2bfe8': Structure needs cleaning
ls: cannot access '/var/lib/docker/aufs/diff/0343c5937810e3c3f5e5a498313e750e999b8cb94af594f5fcba65df3fdbe2bc': Bad message
ls: cannot access '/var/lib/docker/aufs/diff/820d81797c5def482e801bf34f8eb04d2a450b0953e05dfde00df6f9d14ee622': Bad message
ls: cannot access '/var/lib/docker/aufs/diff/60d8e5b199b6174b961d1816f28b8f57e51fb804801ae575a072236cf8c4b6bd': Bad message
ls: cannot access '/var/lib/docker/aufs/diff/c695ddf7d28d33927d7a765fd663adc8e9f1247205e8edbf412a99834b2bd95f-init': Structure needs cleaning
ls: cannot access '/var/lib/docker/aufs/diff/1c3189fc671ceda01cbbb6f69638ca4fac0e21ced11f408344800ed958d051ad': Bad message
ls: cannot access '/var/lib/docker/aufs/diff/e8f6890b9a13e038a889a97d8b69ba91be4cb0e3fdd957a4a2fbdd1022fa0f70': Structure needs cleaning
ls: cannot access '/var/lib/docker/aufs/diff/f326cec447d2b79e33f6fc025e0602b27bff6b015e15c1954ba44f2e9d78e940': Bad message
ls: cannot access '/var/lib/docker/aufs/diff/7d01c245a89938ed1195e1758f9f69590867a032aafa18aaa24aaa70d803993f': Bad message
ls: cannot access '/var/lib/docker/aufs/diff/15c3a18e8d31883cb877c4d007b471fef1a2a952c6cf6b0b43fde9da36a2eabd': Structure needs cleaning
ls: cannot access '/var/lib/docker/aufs/diff/35f41dd0146ce5dbfb91b070e4b7bf3b0370a2cbdbb33174d9f293c39b4c71eb': Bad message
ls: cannot access '/var/lib/docker/aufs/diff/e15e0606ceed2619139f73f749eb309eb54470ae646805815c39335274e3ab6d': Structure needs cleaning
ls: cannot access '/var/lib/docker/aufs/diff/c695ddf7d28d33927d7a765fd663adc8e9f1247205e8edbf412a99834b2bd95f-removing': Structure needs cleaning
ls: cannot access '/var/lib/docker/aufs/diff/db3f9d89606a04bb46814e5dbda576ec84e21549166dcc7c42eb9396a867af89': Bad message
ls: cannot access '/var/lib/docker/aufs/diff/e1883908d2ed3c9ad16b3bf7f68967ae04215a4228db7d04d6932aed0140476c': Structure needs cleaning
ls: cannot access '/var/lib/docker/aufs/diff/d749f16457fef9ec6910df1c90801277cf34d02db437e6762ebca48ed63be6ac-removing': Bad message
ls: cannot access '/var/lib/docker/aufs/diff/0e62068e083a8c64e4fb9781840a5ff397a484264dfd11852be70f0b2ccf5936': Bad message
ls: cannot access '/var/lib/docker/aufs/diff/24c8e79fa80a18dab50eda561528f09c4b7c6222a3c29b9b5a139d77e81dcbf0-init': Bad message
ls: cannot access '/var/lib/docker/aufs/diff/0879bae84c69656ca0fef026957df52282ef7bcd5ea1b8412675e03ba1427ac4': Bad message
ls: cannot access '/var/lib/docker/aufs/diff/24c8e79fa80a18dab50eda561528f09c4b7c6222a3c29b9b5a139d77e81dcbf0': Bad message
total 4
d????????? ? ? ? ? ? 0343c5937810e3c3f5e5a498313e750e999b8cb94af594f5fcba65df3fdbe2bc
d????????? ? ? ? ? ? 0879bae84c69656ca0fef026957df52282ef7bcd5ea1b8412675e03ba1427ac4
d????????? ? ? ? ? ? 0e62068e083a8c64e4fb9781840a5ff397a484264dfd11852be70f0b2ccf5936

Which looks like a corrupted disk to me.

I didn’t capture the journalctl -u resin-supervisor logs yesterday and it says ‘no entries’ today, however dmesg | tail -n 100 shows:

[ 4595.537299] EXT4-fs error: 14 callbacks suppressed
[ 4595.537327] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1971789: comm rm: iget: bad extra_isize 65535 (inode size 256)
[ 4595.545825] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1971789: comm rm: iget: bad extra_isize 65535 (inode size 256)
[ 4595.562323] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1775154: comm rm: iget: checksum invalid
[ 4595.570535] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1775154: comm rm: iget: checksum invalid
[ 4595.585164] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1971784: comm rm: iget: bad extra_isize 65535 (inode size 256)
[ 4595.593490] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1971784: comm rm: iget: bad extra_isize 65535 (inode size 256)
[ 4595.619478] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1974417: comm rm: iget: bad extra_isize 24888 (inode size 256)
[ 4595.627759] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1974417: comm rm: iget: bad extra_isize 24888 (inode size 256)
[ 4595.651246] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1970712: comm rm: iget: checksum invalid
[ 4595.659687] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1970712: comm rm: iget: checksum invalid
[ 7809.887697] EXT4-fs error: 18 callbacks suppressed
[ 7809.887729] EXT4-fs error(device mmcblk0p6): ext4_lookup:1706: inode #1971131: comm bash: iget: checksum invalid
[ 7809.896505] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1769478: comm bash: iget: bad extra_isize 65535 (inode size 256)
[ 7809.905196] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1974103: comm bash: iget: checksum invalid
[ 7820.823687] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1769478: comm bash: iget: bad extra_isize 65535 (inode size 256)
[ 7820.832034] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1974103: comm bash: iget: checksum invalid
[ 7823.700484] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1769478: comm bash: iget: bad extra_isize 65535 (inode size 256)
[ 7823.708830] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1769478: comm bash: iget: bad extra_isize 65535 (inode size 256)
[ 7824.933208] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1769478: comm rm: iget: bad extra_isize 65535 (inode size 256)
[ 7824.941521] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1769478: comm rm: iget: bad extra_isize 65535 (inode size 256)
[ 8323.759237] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1974401: comm rm: iget: bad extra_isize 24888 (inode size 256)
[ 8323.767602] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1974401: comm rm: iget: bad extra_isize 24888 (inode size 256)
[ 8323.782224] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1971779: comm rm: iget: bad extra_isize 65535 (inode size 256)
[ 8323.791712] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1971779: comm rm: iget: bad extra_isize 65535 (inode size 256)
[ 8323.808182] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1769478: comm rm: iget: bad extra_isize 65535 (inode size 256)
[ 8323.816573] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1769478: comm rm: iget: bad extra_isize 65535 (inode size 256)
[ 8323.827673] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1974949: comm rm: iget: checksum invalid
[ 8323.836617] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1974949: comm rm: iget: checksum invalid
[ 8323.847216] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1769473: comm rm: iget: bad extra_isize 65535 (inode size 256)
[ 8323.855402] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1769473: comm rm: iget: bad extra_isize 65535 (inode size 256)
[ 8998.662998] EXT4-fs error: 468 callbacks suppressed
[ 8998.663029] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1974401: comm ls: iget: bad extra_isize 24888 (inode size 256)
[ 8998.680042] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1971779: comm ls: iget: bad extra_isize 65535 (inode size 256)
[ 8998.695131] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1769478: comm ls: iget: bad extra_isize 65535 (inode size 256)
[ 8998.705327] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1974949: comm ls: iget: checksum invalid
[ 8998.715033] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1769473: comm ls: iget: bad extra_isize 65535 (inode size 256)
[ 8998.724456] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1974427: comm ls: iget: bad extra_isize 24888 (inode size 256)
[ 8998.735220] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1971773: comm ls: iget: bad extra_isize 65535 (inode size 256)
[ 8998.746437] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1970717: comm ls: iget: checksum invalid
[ 8998.755999] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1971769: comm ls: iget: bad extra_isize 65535 (inode size 256)
[ 8998.768329] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1974089: comm ls: iget: checksum invalid
[ 9011.569438] EXT4-fs error: 31 callbacks suppressed
[ 9011.569470] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1971131: comm bash: iget: checksum invalid
[ 9011.577984] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1971131: comm bash: iget: checksum invalid
[ 9011.589495] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1769478: comm bash: iget: bad extra_isize 65535 (inode size 256)
[ 9011.597729] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1769478: comm bash: iget: bad extra_isize 65535 (inode size 256)
[ 9011.609618] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1974103: comm bash: iget: checksum invalid
[ 9011.618107] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1974103: comm bash: iget: checksum invalid
[ 9020.271049] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1971131: comm ls: iget: checksum invalid
[ 9020.285660] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1769478: comm ls: iget: bad extra_isize 65535 (inode size 256)
[ 9020.300183] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1974103: comm ls: iget: checksum invalid
[ 9031.159067] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1974401: comm ls: iget: bad extra_isize 24888 (inode size 256)
[ 9031.173948] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1971779: comm ls: iget: bad extra_isize 65535 (inode size 256)
[ 9031.188556] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1769478: comm ls: iget: bad extra_isize 65535 (inode size 256)
[ 9031.199091] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1974949: comm ls: iget: checksum invalid
[ 9031.207938] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1769473: comm ls: iget: bad extra_isize 65535 (inode size 256)
[ 9031.217281] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1974427: comm ls: iget: bad extra_isize 24888 (inode size 256)
[ 9031.230999] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1971773: comm ls: iget: bad extra_isize 65535 (inode size 256)
[ 9031.240614] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1970717: comm ls: iget: checksum invalid
[ 9031.250467] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1971769: comm ls: iget: bad extra_isize 65535 (inode size 256)
[ 9031.260751] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1974089: comm ls: iget: checksum invalid
[13591.251872] ICMPv6: process `sysctl' is using deprecated sysctl (syscall) net.ipv6.neigh.default.base_reachable_time - use net.ipv6.neigh.default.base_reachable_time_ms instead
[13597.892238] audit: type=1334 audit(1645045278.960:14): prog-id=13 op=LOAD
[13597.894034] audit: type=1334 audit(1645045278.960:15): prog-id=14 op=LOAD
[13628.783717] audit: type=1334 audit(1645045309.850:16): prog-id=14 op=UNLOAD
[13628.787034] audit: type=1334 audit(1645045309.850:17): prog-id=13 op=UNLOAD
[57363.712802] EXT4-fs error: 31 callbacks suppressed
[57363.712829] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1974401: comm ls: iget: bad extra_isize 24888 (inode size 256)
[57363.727275] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1971779: comm ls: iget: bad extra_isize 65535 (inode size 256)
[57363.742149] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1769478: comm ls: iget: bad extra_isize 65535 (inode size 256)
[57363.752571] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1974949: comm ls: iget: checksum invalid
[57363.762420] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1769473: comm ls: iget: bad extra_isize 65535 (inode size 256)
[57363.771944] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1974427: comm ls: iget: bad extra_isize 24888 (inode size 256)
[57363.785281] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1971773: comm ls: iget: bad extra_isize 65535 (inode size 256)
[57363.794561] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1970717: comm ls: iget: checksum invalid
[57363.803869] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1971769: comm ls: iget: bad extra_isize 65535 (inode size 256)
[57363.813871] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1974089: comm ls: iget: checksum invalid
[57382.512715] EXT4-fs error: 31 callbacks suppressed
[57382.512741] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1974401: comm ls: iget: bad extra_isize 24888 (inode size 256)
[57382.527815] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1971779: comm ls: iget: bad extra_isize 65535 (inode size 256)
[57382.543003] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1769478: comm ls: iget: bad extra_isize 65535 (inode size 256)
[57382.552844] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1974949: comm ls: iget: checksum invalid
[57382.562054] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1769473: comm ls: iget: bad extra_isize 65535 (inode size 256)
[57382.571443] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1974427: comm ls: iget: bad extra_isize 24888 (inode size 256)
[57382.582199] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1971773: comm ls: iget: bad extra_isize 65535 (inode size 256)
[57382.591495] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1970717: comm ls: iget: checksum invalid
[57382.601376] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1971769: comm ls: iget: bad extra_isize 65535 (inode size 256)
[57382.612596] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1974089: comm ls: iget: checksum invalid
[57392.778703] EXT4-fs error: 31 callbacks suppressed
[57392.778728] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1974401: comm ls: iget: bad extra_isize 24888 (inode size 256)
[57392.793614] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1971779: comm ls: iget: bad extra_isize 65535 (inode size 256)
[57392.808690] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1769478: comm ls: iget: bad extra_isize 65535 (inode size 256)
[57392.821124] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1974949: comm ls: iget: checksum invalid
[57392.833643] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1769473: comm ls: iget: bad extra_isize 65535 (inode size 256)
[57392.842517] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1974427: comm ls: iget: bad extra_isize 24888 (inode size 256)
[57392.851661] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1971773: comm ls: iget: bad extra_isize 65535 (inode size 256)
[57392.861162] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1970717: comm ls: iget: checksum invalid
[57392.870108] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1971769: comm ls: iget: bad extra_isize 65535 (inode size 256)
[57392.884595] EXT4-fs error (device mmcblk0p6): ext4_lookup:1706: inode #1974089: comm ls: iget: checksum invalid

Which again points to corruption.

I’ve read that balenaOS automatically runs e2fsck on the data partition on boot, but I’ve tried reboot a few times with no luck.

I believe that I have to unmount /mnt/data, e2fsck /mnt/data, mount /mnt/data in an attempt to clean this (caveat, I’m not a linux expert), but I can’t do this remotely and can’t access the location for a few weeks.

Is there anything that can be done?

Hello,

There is one last-ditch effort to try, but it can also cause you to completely lose access to the device and require a physical re-flashing of the card.

Stop the supervisor and engine as they use the data partition

$ systemctl stop balena-supervisor
$ systemctl stop balena

Check where it is mounted, take note of the mount options too

$ mount | grep mmcblk0p6
/dev/mmcblk0p6 on /mnt/data type ext4 (rw,relatime)
/dev/mmcblk0p6 on /resin-data type ext4 (rw,relatime)
/dev/mmcblk0p6 on /var/lib/docker type ext4 (rw,relatime)
/dev/mmcblk0p6 on /var/volatile/lib/docker type ext4 (rw,relatime)

Unmount all of them (some of them are aliases so unmounting one may take another one down)

$ umount /mnt/data /resin-data /var/lib/docker /var/volatile/lib/docker

If umount fails with “/mnt/data: target is busy”, kill the processes using it

$ ps -Af | grep 'healthcheck' | awk '{print $2}' | xargs kill -9
$ lsof | grep /mnt/data
bash 516736 root cwd DIR 179,6 4096 2 /mnt/data
$ kill -9 516736 # just an example!

Re-check that nothing uses the data partition, and run fsck

$ mount | grep mmcblk0p6
$ fsck -y /dev/mmcblk0p6

Remount the partitions

$ mount -o rw,relatime /dev/mmcblk0p6 /mnt/data
$ mount -o rw,relatime /dev/mmcblk0p6 /resin-data
$ mount -o rw,relatime /dev/mmcblk0p6 /var/lib/docker
$ mount -o rw,relatime /dev/mmcblk0p6 /var/volatile/lib/docker

Clean up balenaEngine folders

$ rm -rf /var/lib/docker/{aufs,overlay2,containers,image,tmp}

Restart balenaEngine and re-download the balena supervisor

$ systemctl start balena
$ update-balena-supervisor

If everything is looking good, a final, risky test is to reboot.
Depending on how harsh the fsck recovery was, the device may go offline
and may require physical recovery (SD card replacement).

$ reboot

On the off chance it’s not a corrupted card issue, there is this Ubuntu forum post which might be helpful to you: How to fix 'Structure Needs Cleaning' Error

1 Like

Thank you very much. Got everything working again. This really helped. Going to install a better power supply in the feature, as this one was probably the cause of data corruption.

Thanks! This likewise helped me recover from a non-starting supervisor. update-balena-supervisor kept failing due to Open Balena reporting a malformed URL on an image query, but the supervisor image comes from Balena Cloud, so if anyone runs into that, it is possible to manually run the relevant part of /usr/bin/update-balena-supervisor manually based on the settings in /etc/balena-supervisor/supervisor.conf. With no change in supervisor version and no supervisor running, that may be as little as:

image_name=registry2.balena-cloud.com/v2/7ab790063e94d7eb3f5e240c281131fb
balena pull "$image_name"
image_id=$(balena images --filter=reference="${image_name%@*}" --format "{{.ID}}")
balena tag "${image_id}" "balena_supervisor:v12.11.38"
systemctl start balena-supervisor

After that, the supervisor takes care of reinstalling all the regular services.