Pruning balena registry stored images

@MBer you should see a little delete button (trash can icon) on the right hand side of releases - is this not appearing for you?

@drcnyc No, but I found that the ui container still had code from before the latest commit there. open-balena-admin/scripts/compose pull was required to get the latest images before launching the containers again. Everything works now, and the changes are all useful for deletion management:


The bulk-select is nice too:

Deletion confirmed working (clicking Confirm in the pop-up twice per batch). Thank you very much!

@MBer no problem, glad you were able to make use of it! Remember to run garbage-collect on your registry periodically to free up the storage after deleting releases.

Seems like a major flaw in the balena-api if doing:

curl -s -H "Authorization: Bearer $(cat ./token)" -X 'DELETE' "https://${hostname}/v6/release?\$filter=id%20eq%20$id"

Deletes balena’s knowledge of a release but doesn’t delete the underlying tags/env variables/images/docker repos. That seems absolutely crazy.

Fantastic that balena-admin implemented all of this, but that isn’t exactly automatable via a CI/CD pipeline to automatically delete old/used releases.

Here is the script I wrote, expecting it to free up space on the minio/s3 instance, but even after running garbage-collect, no space is freed and garbage-collect reported 10868 blobs marked, 0 blobs and 0 manifests eligible for deletion

#!/bin/bash

# Default values
days_old=14
recent_to_keep=10
hostname="api.balena-cloud.com"

# Help message
print_help() {
    echo "Usage: $0 [options]"
    echo "Options:"
    echo "  --days DAYS       Set the number of days a release must be old to be eligible for deletion. Default is 14."
    echo "  --keep KEEP       Set the number of the most recent releases to keep. Default is 10."
    echo "  --host HOSTNAME   Set the hostname of the server. Default is api.balena-cloud.com."
    echo "  --help, --?       Show this help message."
}

# Parse named arguments
while [[ $# -gt 0 ]]; do
    case $1 in
        --days)
            days_old="$2"
            shift # past argument
            shift # past value
            ;;
        --keep)
            recent_to_keep="$2"
            shift # past argument
            shift # past value
            ;;
        --host)
            hostname="$2"
            shift # past argument
            shift # past value
            ;;
        --help|--?)
            print_help
            exit 0
            ;;
        *)    # unknown option
            echo "Unknown option: $1"
            print_help
            exit 1
            ;;
    esac
done

echo "Parameters set: --days $days_old, --keep $recent_to_keep, --host $hostname"

# Step 1: Get a list of releases
echo "Fetching release data..."
curl -s -H "Authorization: Bearer $(cat ./token)" 'https://'${hostname}'/v6/release?$orderby=created_at%20desc&$select=id,status,commit,source,created_at,start_timestamp,update_timestamp,end_timestamp,raw_version,is_invalidated,is_final&$expand=is_running_on__device/$count,should_be_running_on__application($select=id),should_be_running_on__device($select=id)' > releases.json

if [ $? -ne 0 ]; then
    echo "Error fetching release data."
    exit 1
fi

# Step 2: Process releases.json to get a list of IDs to delete
ids_to_delete=$(jq --arg days "$days_old" --argjson keep "$recent_to_keep" '.d | .[$keep:] | map(select(.is_running_on__device == 0 and .should_be_running_on__application == [] and .should_be_running_on__device == [] and ((.created_at | rtrimstr("Z") | split(".")[0] + "Z" | fromdateiso8601) < (now - 86400 * ($days | tonumber))))) | .[].id' releases.json)


if [ $? -ne 0 ]; then
    echo "Error processing release data."
    exit 1
fi

if [ -z "$ids_to_delete" ] || [ "$ids_to_delete" == "[]" ]; then
    echo "No releases to clean up."
    exit 0
fi

# Convert the jq output into an array
readarray -t ids <<<"$ids_to_delete"

# Step 3: Delete each release
for id in "${ids[@]}"; do
    # Fetch release details for printing
    details=$(jq -r --arg id "$id" '.d[] |
    select(.id == ($id | tonumber)) |
    {
        id, created_at, raw_version,
        ago: (
            ((now - (.created_at | rtrimstr("Z") | split(".")[0] + "Z" | fromdateiso8601)) / 86400) | floor
        ) | (
            if . > 1 then 
                "~\(.) days ago" 
            elif . == 1 then 
                "1 day ago" 
            else 
                "less than a day ago" 
            end
        )
    } |
    "Deleting release ID \(.id) (version \(.raw_version)) from \(.created_at) \(.ago)"' releases.json)
    echo "$details"
    
    # Actual deletion command
     curl -s -H "Authorization: Bearer $(cat ./token)" -X 'DELETE' "https://${hostname}/v6/release?\$filter=id%20eq%20$id"

    if [ $? -ne 0 ]; then
        echo "  ...Error deleting release with ID $id."
    else
        echo "  ...Successfully deleted release with ID $id."
    fi
done

… to add to that fun, I now have about 100 releases that are deleted from balena-api, but not registry or S3, so finding those is going to be painful :man_facepalming::sweat_smile:

@drcnyc I left this alone a while as the garbage collection seemed to be running, but a closer look shows that /usr/local/bin/docker-registry garbage-collect /etc/docker-registry.yml --delete-untagged does not actually free any space. After deleting dozens of releases, the output is:

WARN[0000] Ignoring unrecognized environment variable REGISTRY_SHA256 
v2/0047ba0b739587fb494ad1e4d6d80685
v2/0047ba0b739587fb494ad1e4d6d80685: marking manifest sha256:ffed3ef4eaa58cfc2378687734559be8ccb17b601a9e369655ea539ab020fba0 
v2/0047ba0b739587fb494ad1e4d6d80685: marking blob sha256:9cc5d0d01e8a99aefe40a9fbf50eb8e23bd37ddd373b4de2b4a088e5c140ddc7
[wall of similar lines ]
1235 blobs marked, 0 blobs and 0 manifests eligible for deletion

This problem seems to be common to other systems using docker-registry (How to cleanup container registry blobs in Kubernetes with garbage collection – The Linux Notes, Garbage Collection doesn't work properly - Self-managed - GitLab Forum). Most solutions talk about this being due to leftover tags, but rely on external tools that I have not found a way to use with open-balena-registry. Does anyone have suggestions for how I can get more information about the state of the registry and whether there are low-level tags in the way?