Balena Management Engine Stopped

Hello!

I have a balena device that appears to be stuck. Logging in, it looks like all containers have stopped including the balena management engine.

I’ve tried restarting that container, but it fails to start. Can anyone suggest a debugging sequence?

I’ve granted support access to the device: d3eabe8d833e9dd9228db40b4e882aea

Hi @keenanbedrock – thanks for posting! The best way to get started debugging your device is to run Device Health Checks; this can be found on the Diagnostics tab of your device page. Can you give that a try and report back on what it finds?

All the best,
Hugh

Ah interesting. Here is the output:

Looks like it reports a diskspace issue. Here’s the output of df.

What would you suggest I do next?

root@d3eabe8:~# du -a /mnt/data/docker | sort -nr | head -10
28929743 /mnt/data/docker
28920757 /mnt/data/docker/overlay2
27366409 /mnt/data/docker/overlay2/7ce064edb21bd99d89119f720539707e0228563d874cbf27a3a62662b3de458e
27366402 /mnt/data/docker/overlay2/7ce064edb21bd99d89119f720539707e0228563d874cbf27a3a62662b3de458e/diff
27364927 /mnt/data/docker/overlay2/7ce064edb21bd99d89119f720539707e0228563d874cbf27a3a62662b3de458e/diff/root
27364925 /mnt/data/docker/overlay2/7ce064edb21bd99d89119f720539707e0228563d874cbf27a3a62662b3de458e/diff/root/.ros
1094144 /mnt/data/docker/overlay2/41556798be7d80fe37106a8cc076b6b0a5d046a71017eb0164113f27ed10fdb2
1094140 /mnt/data/docker/overlay2/41556798be7d80fe37106a8cc076b6b0a5d046a71017eb0164113f27ed10fdb2/diff
1049960 /mnt/data/docker/overlay2/41556798be7d80fe37106a8cc076b6b0a5d046a71017eb0164113f27ed10fdb2/diff/usr
756129 /mnt/data/docker/overlay2/41556798be7d80fe37106a8cc076b6b0a5d046a71017eb0164113f27ed10fdb2/diff/usr/lib

Hi @keenanbedrock – yep, disk space definitely looks like it’s a problem. Clicking on the test name should take you to our triage page for that test, which in turn has a link to our Debugging Masterclass; that will have a bunch of different suggestions for cleaning up the problem and how to prevent it in the future. Can you take a look and let us know if you run into problems?

All the best,
Hugh

It looks like the /mnt/data directory is full.

I’ve run the prune commands from the masterclass, but that didn’t free up any space. What files are safe to delete in /mnt/data?

I’ve cleaned up the diskspace and now I see a thermal warning and a warning that the management engine isn’t running.

I don’t understand how to restart the supervisor. Please advise.

Thanks!

Ok I’ve managed to get the supervisor running again, but now I’m getting a no such image download error.

07.08.20 12:43:37 (-0700) Downloading image ‘registry2.balena-cloud.com/v2/1e07ffa6585b9f811183010b29e86f22@sha256:8c57b55247e8d1832524a84f02123278931e3a44694bf29abad4acd6a421fadd
07.08.20 12:43:40 (-0700) Failed to download image ‘registry2.balena-cloud.com/v2/1e07ffa6585b9f811183010b29e86f22@sha256:8c57b55247e8d1832524a84f02123278931e3a44694bf29abad4acd6a421fadd’ due to '(HTTP code 404) no such image - no such image: registry2.balena-cloud.com/v2/1e07ffa6585b9f811183010b29e86f22@sha256:8c57b55247e8d1832524a84f02123278931e3a44694bf29abad4acd6a421fadd: No such image: registry2.balena-cloud.com/v2/1e07ffa6585b9f811183010b29e86f22@sha256:8c57b55247e8d1832524a84f02123278931e3a44694bf29abad4acd6a421fadd

How can I resolve this?

It looks like your image over 1.5 Gb and may take a while to download. Let us know if those messages do not go away soon.

I’d also suggest addressing the temperature issue at some point which can cause a variety of problems on the device.