Strange message: "Error cleaning up: (HTTP code 409) conflict - conflict: unable to delete (cannot be forced)"

We’re getting the following message on one of our staging devices, which we’ve never seen before.

04.02.19 16:39:10 (-0800) Error cleaning up sha256:1652338c2fe7f7286bd15317a7005322f0c69510011361e2dbb147ad59e3889e: (HTTP code 409) conflict - conflict: unable to delete 1652338c2fe7 (cannot be forced) - image has dependent child images - will ignore for 1 hour

The container seems to be running fine.

Not as critical as the other issues I’ve posted about recently, but let me know if you’d like support access to potentially debug what’s happening.

Or let me know if you have any other info about the above message.

Hi @troyvsvs, it appears the image above has dependent child images so it can’t be deleted. Would you mind granting support access for taking a look? Thanks!

Hi @troyvsvs, it appears the image above has dependent child images so it can’t be deleted. Would you mind granting support access for taking a look? Thanks!

Sure, we’ll enable support access shortly.

Out of curiosity, what exactly does “dependent child images” mean in this context? To my knowledge, we just have one image “main” that we run.

Granted

Docker images are layered and reference one another hierarchically, such that an image can be a child of another image (this is the image specified in the “FROM” directive of the Dockerfile); see the documentation glossary pages on “parent image”, “image” [1] [2]. The parent image of your main container will probably be some kind of “base image” from our docker repositories [3].

There is also a container running on your device called the “supervisor” which is responsible for interacting with the platform, and this also may have parent images [4].

This all said, could you provide a link to the dashboard for the device you’ve granted access to? We can then inspect what might be going on.

Cheers,

Nick

[1] https://docs.docker.com/glossary/?term=parent%20image
[2] https://docs.docker.com/glossary/?term=image
[3] https://www.balena.io/docs/reference/base-images/base-images/
[4] https://www.balena.io/docs/reference/OS/overview/2.x/#balena-supervisor

@troyvsvs, for the purpose of sharing your device UUID or device web URL, I’ve sent you an e-mail to the address registered in your forum account. You may reply to that e-mail instead of replying here. Note that if the UUID of your device is shared publicly, it allows anyone to visit the device’s public URL (if it is enabled). Thanks!

Hi @troyvsvs, thanks for sharing the device UUID with our support team. On investigation, I have found that the device had two balena supervisor images in it, an old one and a new one. Only the new image was running, which is correct, and the error message you were seeing was the supervisor failing to remove its old image. We have recently had an automated background cloud task automatically patching the balena supervisor in the devices, and I believe that his extra supervisor image was a leftover from that task – in every other device the extra image was automatically deleted, but for some reason this did not happen in this device. I have manually deleted the extra image and I expect you will not be seeing that error message again. I’ll share my findings with some team members and I hope we will figure out what went wrong. Thanks for reporting it!

Hello @pdcastro

We are running devices via GSM and I’ve seen this message on a few of our devices as well. Even though it doesn’t seem like it’s causing any obvious problems for us right now could you share your way to the solution if you could spare the time? Like in a little more technical terms what one could do to solve this via SSHing to the Host OS for example?

Where is the image located that you have to delete? I am assuming it is a file findable by the SHA hash from the message? Should one delete other files aswell (Other layers or something? Sorry not the biggest docker professional yet.) How do I make sure that it is indeed a stale supervisor image and not some different image?
Sadly giving support access is only an option for a few of our devices but not all, so we would like to do this ourselves if possible.

Additionally could you shed some more light on how these cloud task patches are working? Such patches could be a source of unforeseeable changes for us if the supervisor is patched without us knowing. First of all there is the obvious problem of our devices having access to the internet only via GSM, hence bandwidth is more previous than when connected via a flatrated router, but I’m not assuming these are big patches.
Furthermore however we would like to generally avoid patching something that might not be necessarily relevant to us, it could break more than it fixes after all and we are responsible for our devices.

So what exactly is being automatically patched, how often and how important are these changes? If there is even the possibility of breaking something how can we turn this functionality off and instead opt-in to an update when we have identified that we need it? Where is the difference to the bigger balenaOS updates?

Thanks for your time and thanks for solving issues so quickly!
Greetings, Tarek

Hi @Tschebbischeff, I understand your concern and the need for clarification. I have asked our fleet operations team to address the matter and we will get back to you soon. As for the commands used to locate and delete the old/unused balena supervisor image on a host OS terminal, they were:

balena-engine images - lists the images
balena-engine ps - lists running containers
balena-engine rmi <imageName:Tag> - deletes the specified image

I had observed that there were two images named ‘balena/armv7hf-supervisor’ (with different tags), but only one balena supervisor container running off one of the images. The fix was then to delete the unused balena supervisor image.

1 Like

This has not worked for me, because the “image has dependent child images”, so it cannot be deleted, even with force. Maybe the Supervisor has to be stopped, so that the image can be downloaded.
What actually worked is just to upgrade to a newer release, than remove the image as suggested.

Hey @daghemo a small change, the delete has to be done not with <imageID> but by the tag, then it will work. here’s the exact way of doing that these 4 lines, and the cleanup should succeed:

IMG="$(balena images | grep 'resin/\|balena/' | grep supervisor | grep none | awk '{print $3}')"
REPO="$(balena images | grep 'resin/\|balena/' | grep supervisor | grep none | awk '{print $1}')"
balena tag $IMG $REPO:foobar
balena rmi $REPO:foobar

Hi @Tschebbischeff, some more info regarding your questions

That message shouldn’t cause any issues if you are using supervisor 7.0.0 or newer. But you can remove the offending image as described in the example above, or this below is a more generic version that works across all the OS versions and cleans up the image:

if [ -x "$(command -v balena)" ]; then
   CMD=balena;
else
   CMD=docker;
fi;

IMG="$($CMD images | grep 'resin/\|balena/' | grep supervisor | grep none | awk '{print $3}')"
REPO="$($CMD images | grep 'resin/\|balena/' | grep supervisor | grep none | awk '{print $1}')"
$CMD tag $IMG $REPO:foobar
$CMD rmi $REPO:foobar

The image is located inside the device’s balena Engine. No finding needed, the command line options detailed above should be doing what’s needed. This is the only issue that we know of that can cause the error message you mentioned, and that’s why we think this is the solution.

Let us know if you run into any issues, and we can help further then.

Yes was a tiny patch, a small script run on the device, that needed to do some changes to the supervisor currently on the device.

These changes were in part of the supervisor that was needed for our backend operations. It doesn’t affect user facing behaviour, but are relevant to the behaviour of the platform overall.

It was a special case and we are not running these events on a regular basis. The changes for important from the platform’s stability, where the supervisors can report when they encounter issues with their own operations (for example supervisors excessively restarting, or cannot operate as designed for other issues, that we’ve sometimes seen). These are information relevant solely to the supervisor, and allows the performance of preventive maintenance, such as notifying the relevant user that their device is misbehaving even when they haven’t realized it yet, or catch issues with a released supervisor version that would alow us to relase properly fixed new supervisors based on those issues captured.

We apologize for the inconvenience caused. We are very mindful of lot of our users being on very slow, or very expensive networks, and we take that into account when issuing such tasks as these operational patches this time. They are not run automatically at the moment, but our Fleet Operatiosn team takes care of issues with an eye of the devices across the platform whenever it is absolutely positively needed.

Hope this helps to explain a few more things, and please let us know if you have any further questions! Thanks a lot!