One container vanished on multiple devices?

scscsc · May 14, 2025, 7:12pm

Our three fleets each have 2 containers. On about 5 devices in one fleet and 2 in the other, one of those two containers shut down properly (based on our service logs) and then just vanished. Gone from the cloud inventory for those devices, the balena ps -a doesn’t even show it as exited, and balena images ls no longer shows the image, but the volumes remained (thankfully!). There seemed to be no trace of the missing container having ever existed through balena cli commands (while SSH’d to the devices) or any config or OS level logs we could find on the box. Rebooting, restarting balena services, etc didn’t make any difference.

Rolling out a new release of our compose yml with new container versions brought it back.

If this happens again, what can we do to troubleshoot further? What commands can we run or logs can we capture?

mpous · May 20, 2025, 9:33am

Hello @scscsc thanks for sharing.

Did you perform any action before experimenting this issue? Any software or hostOS update?

What device type do you use? What balenaOS and supervisor versions are you using?

Could you please share anything so we can try to reproduce?

scscsc · June 2, 2025, 8:59pm

There was no common thread - different OS versions (5.24-prod, 6.0.13+rev1, 6.3.12+rev4) and different supervisor versions (16.1.0, 16.4.6, 17.0.1). No host OS or supervisor updates were performed.

Perhaps it was the host’s OOMKiller and that caused moby to nuke the container instead of just restarting it?

We really have no idea, as when we got into these devices to investigate there were no traces of the missing container, which is why I’m asking for tips on additional commands to run or logs/places to look if we see the issue again.

mpous · June 3, 2025, 9:00am

@scscsc could you please grant support access to the device/s and share the complete UUID of the device (if you want via DM)?

we would like to explore more!

scscsc · June 4, 2025, 5:46pm

Our fleets in balena-cloud haven’t seen this happen (yet). The fleets where this happened were on our open-balena side, so I can’t grant support access… but, even there, switching to a newer version of our app restored the missing container, so it would seem there is nothing left to investigate.

Still just looking for advice for the next time it happens on any self-investigation things we should capture to try to sort it out. Obviously when it happens we are working against the clock to get it fixed as quickly as possible, so capturing everything we can, then deploying a new version is about the best we can do.

mpous · June 5, 2025, 7:59am

@scscsc did you try to capture any supervisor error in the logs? maybe check that next time!

Let us know if this happen again!

scscsc · June 5, 2025, 8:06pm

There didn’t appear to be any errors in the supervisor or host os logs. Will capture them all next time, regardless of info/warn/error.

Topic		Replies	Views
Supervisor constantly restarting balenaOS docker	3	248	October 10, 2023
Devices unable to start new containers Product support	1	23	February 18, 2025
BalenaOS Does Not Update a Container Product support	6	463	April 29, 2021
'balena ps' shows container restarting for many hours; reboot required to clear condition balenaEngine	30	2563	January 26, 2021
Production device container stuck on "stopping" Product support	7	449	August 29, 2019

One container vanished on multiple devices?

Related topics