/mnt/data/docker/aufs/diff is full

We’re seeing this as well. /mnt/data/docker/aufs filled the disk with diffs, now updates aren’t able to download.


Hi @SplitIce is your application writing a lot of data in the container?

Can you provide the dashboard link for the device please?

@floion There would be temporary data written, yes. Data that we fully expect to loose on reboot.

edec7670a3f5e81134d6602eee189bb3c7b2f6fe is currently one of the devices that we have not yet cleared.

What do you mean by not yet cleared? How do you clear the data?

On some I’ve rm’ed some using a a really scrappy bash script that looks for dangling containers that contain /run.sh

A couple were just testing devices and were entirely cleared.

Did you run that cleanup script from the host OS?

Yes, of course.

AFAIK it’s the only way to access the large build up of non-cleared containers.

Can you also make sure your application itself does the cleanup?

  1. It’s not the applications job to cleanup after resin-supervisor
  2. It’s not possible for the container to access the host os’s root filesystem and other resources in order to perform the operations without major hacks.
  3. The bash script approach is horrifically hacky

What I meant was if your application is writing a lot of data in the user container then it should also do some housekeeping. No need for the user container to do something in the host OS. Just to make sure your application is not filling up the space it’s own container

I don’t think you understand the problem, see the ls output provided where it shows there are over 500 excess layers. All of these are previous instances of the containers layers.

This has nothing to do with data written from the container.

I understand the issue, let me ask the supervisor guys if it’s possible for the /var/lib/balena/aufs/diff folder to contain this many layers of the user container application; this location is supposed to hold all of the layers your application container contains, so it’s normal to be more than one there, just not sure if it can reach that many; also, did you do many updates to your application? that may explain the large number of writable layers in that directory

It’s entirely possible for this device to have had alot of releases, probably around 50-100. It’s been around since near the start of our project.

Our container had 5 layers last I checked. Two layers are ours (run & cmd) with the run being invalidated on every build (resin-nocache) and delta updated.

Hey @SplitIce, looking at the output of ls, it appears that all of the leftover diffs are from July 4th and 5th. It could be possible that the supervisor or balena got into a weird state with a release from around tha time frame.

Can you remember any weirdness going on at around that time? Perhaps a bug in application code that created more data than usual. Regardless this is something that should have been automatically handled, and we’re going to investigate.

In the meantime, a fix for this is to clear out the docker directory. Unfortunately, without knowing which diff is for which container, you’d have to remove the user images and containers too. I can do this for you if you like, with a dashboard link and support access enabled. Alternatively, and for the benefit of the thread, the commands would be:

systemctl stop resin-supervisor
systemctl stop balena
rm -rf /var/lib/docker/{aufs,diff,overlay,containers,image,tmp}
systemctl start balena

All our devices have some degree of wastage in this folder. (or /var/lib/docker/overlay for those running overlayfs). Skimming some of those I can see this hub was alone in it’s storage of July 4th and 5th. It’s possible the internet was unstable that day or something.

If I was to guess I’d say supervisor is not cleaning up old container layers.

We are still seeing this with Resin OS 2.14.3+rev5 / supervisor 7.19.4 (although the storage utilisation has moved to /mnt/data/docker/overlay2)

Had a device with a 330MB application and 600MB of data hit 100% (8GB storage) today.

Tracked it down to excess overlays in /mnt/data/docker/overlay2. At a guess it’s failed updates not being cleared as this device has had periods of instability (it’s a staging device).

Hello @SplitIce

Can you please enable support access and provide us the device uuid in a private message?

@SplitIce I’d recommend using a latest balenaOS version. v2.14.3 was taken down from production and shouldn’t be used. There was an issue where balena-engine would keep trying to download updates and fail…