Container stuck in status downloaded/installed

We have a multi-container setup that is running fine on a lot of devices. On one device one container in the status “Installed” and the containers which depend on that one in “Downloaded” (2 other containers are running fine). I’m not exactly sure when this happened but the device is out in the field connected via LTE. The device is running on version balenaOS 2.45.1+rev4 of the UpBoard Image. A restart of the device does not solve the issue.

When checking the logs, I get this output:

Sep 15 07:34:10 af1a482 12d07c71a571[782]: [info]    Applying target state
Sep 15 07:34:10 af1a482 12d07c71a571[782]: [warn]    Ignoring unsupported or unknown compose fields: links
Sep 15 07:34:10 af1a482 12d07c71a571[782]: [warn]    Ignoring unsupported or unknown compose fields: containerName
Sep 15 07:34:10 af1a482 12d07c71a571[782]: [warn]    Ignoring unsupported or unknown compose fields: containerName
Sep 15 07:34:10 af1a482 12d07c71a571[782]: [warn]    Ignoring unsupported or unknown compose fields: links
Sep 15 07:34:10 af1a482 12d07c71a571[782]: [warn]    Ignoring unsupported or unknown compose fields: containerName, links
Sep 15 07:34:10 af1a482 12d07c71a571[782]: [warn]    Ignoring unsupported or unknown compose fields: containerName
Sep 15 07:34:10 af1a482 12d07c71a571[782]: [event]   Event: Service start {"service":{"appId":1762257,"serviceId":777636,"serviceName":"zookeeper","releaseId":1593546}}
Sep 15 07:34:10 af1a482 balenad[782]: time="2021-09-15T07:34:10.661781481Z" level=warning msg="Failed to allocate and map port 32181-32181: Bind for 0.0.0.0:32181 failed: port is already allocated"
Sep 15 07:34:10 af1a482 balenad[782]: time="2021-09-15T07:34:10.728397992Z" level=error msg="6fcd28581c4039af460a3fc831ae13b5f02e457469f5167cdc9b6c8b61df3336 cleanup: failed to delete container from containerd: no such container"
Sep 15 07:34:10 af1a482 balenad[782]: time="2021-09-15T07:34:10.728500331Z" level=error msg="Handler for POST /containers/6fcd28581c4039af460a3fc831ae13b5f02e457469f5167cdc9b6c8b61df3336/start returned error: driver failed programming external connectivity on endpoint zookeeper_2931171_1593546 (a10cb57b0d7db21c5b1b2a9d2cd22c885458ae5c5ecea3c46cc6a8faf8300b79): Bind for 0.0.0.0:32181 failed: port is already allocated"
Sep 15 07:34:10 af1a482 12d07c71a571[782]: [error]   Scheduling another update attempt in 900000ms due to failure:  Error: (HTTP code 500) server error - driver failed programming external connectivity on endpoint zookeeper_2931171_1593546 (a10cb57b0d7db21c5b1b2a9d2cd22c885458ae5c5ecea3c46cc6a8faf8300b79): Bind for 0.0.0.0:32181 failed: port is already allocated 
Sep 15 07:34:10 af1a482 12d07c71a571[782]: [error]         at /usr/src/app/dist/app.js:596:108431
Sep 15 07:34:10 af1a482 12d07c71a571[782]: [error]       at /usr/src/app/dist/app.js:596:108394
Sep 15 07:34:10 af1a482 12d07c71a571[782]: [error]       at m.buildPayload (/usr/src/app/dist/app.js:596:108404)
Sep 15 07:34:10 af1a482 12d07c71a571[782]: [error]       at IncomingMessage.<anonymous> (/usr/src/app/dist/app.js:596:107904)
Sep 15 07:34:10 af1a482 12d07c71a571[782]: [error]       at IncomingMessage.emit (events.js:194:15)
Sep 15 07:34:10 af1a482 12d07c71a571[782]: [error]       at endReadableNT (_stream_readable.js:1125:12)
Sep 15 07:34:10 af1a482 12d07c71a571[782]: [error]       at process._tickCallback (internal/process/next_tick.js:63:19)
Sep 15 07:34:10 af1a482 12d07c71a571[782]: [error]   Device state apply error Error: (HTTP code 500) server error - driver failed programming external connectivity on endpoint zookeeper_2931171_1593546 (a10cb57b0d7db21c5b1b2a9d2cd22c885458ae5c5ecea3c46cc6a8faf8300b79): Bind for 0.0.0.0:32181 failed: port is already allocated 
Sep 15 07:34:10 af1a482 12d07c71a571[782]: [error]         at /usr/src/app/dist/app.js:596:108431
Sep 15 07:34:10 af1a482 12d07c71a571[782]: [error]       at /usr/src/app/dist/app.js:596:108394
Sep 15 07:34:10 af1a482 12d07c71a571[782]: [error]       at m.buildPayload (/usr/src/app/dist/app.js:596:108404)
Sep 15 07:34:10 af1a482 12d07c71a571[782]: [error]       at IncomingMessage.<anonymous> (/usr/src/app/dist/app.js:596:107904)
Sep 15 07:34:10 af1a482 12d07c71a571[782]: [error]       at IncomingMessage.emit (events.js:194:15)
Sep 15 07:34:10 af1a482 12d07c71a571[782]: [error]       at endReadableNT (_stream_readable.js:1125:12)
Sep 15 07:34:10 af1a482 12d07c71a571[782]: [error]       at process._tickCallback (internal/process/next_tick.js:63:19)
Sep 15 07:34:57 af1a482 balenad[782]: time="2021-09-15T07:34:57.626176324Z" level=info msg="shim balena-engine-containerd-shim started" address=/containerd-shim/moby/303b4949d34d377e38ff949db9b26ca3fa2f62a9b69221a28f641b8742fa22c9/shim.sock debug=false pid=27925
Sep 15 07:34:58 af1a482 balenad[782]: time="2021-09-15T07:34:58.687460213Z" level=info msg="shim reaped" id=303b4949d34d377e38ff949db9b26ca3fa2f62a9b69221a28f641b8742fa22c9
Sep 15 07:34:58 af1a482 balenad[782]: time="2021-09-15T07:34:58.695717331Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Sep 15 07:35:09 af1a482 deeb61aacf0e[782]: waiting
Sep 15 07:37:09 af1a482 deeb61aacf0e[782]: waiting
Sep 15 07:37:59 af1a482 balenad[782]: time="2021-09-15T07:37:59.475858027Z" level=info msg="shim balena-engine-containerd-shim started" address=/containerd-shim/moby/ae60a16271ee5bfb59d3ed5e7d3cee1cf322c4cb15fda772bb254393f60645e5/shim.sock debug=false pid=28072
Sep 15 07:38:00 af1a482 balenad[782]: time="2021-09-15T07:38:00.534786559Z" level=info msg="shim reaped" id=ae60a16271ee5bfb59d3ed5e7d3cee1cf322c4cb15fda772bb254393f60645e5
Sep 15 07:38:00 af1a482 balenad[782]: time="2021-09-15T07:38:00.543598115Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Sep 15 07:38:08 af1a482 12d07c71a571[782]: [api]     GET /v1/healthy 200 - 2.649 ms
Sep 15 07:39:09 af1a482 deeb61aacf0e[782]: waiting
Sep 15 07:41:01 af1a482 balenad[782]: time="2021-09-15T07:41:01.318956737Z" level=info msg="shim balena-engine-containerd-shim started" address=/containerd-shim/moby/534a250b8c9f0361f3e72e147a77c26633897ae4b95e6aed6ccbf5dfad023e53/shim.sock debug=false pid=28277
Sep 15 07:41:02 af1a482 balenad[782]: time="2021-09-15T07:41:02.387840307Z" level=info msg="shim reaped" id=534a250b8c9f0361f3e72e147a77c26633897ae4b95e6aed6ccbf5dfad023e53
Sep 15 07:41:02 af1a482 balenad[782]: time="2021-09-15T07:41:02.396218195Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Sep 15 07:41:09 af1a482 deeb61aacf0e[782]: waiting
Sep 15 07:41:36 af1a482 12d07c71a571[782]: [debug]   Attempting container log timestamp flush...
Sep 15 07:41:36 af1a482 12d07c71a571[782]: [debug]   Container log timestamp flush complete

In another thread I found the advice to run netstat -lauten which returns

Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       
tcp        0      0 127.0.0.2:53            0.0.0.0:*               LISTEN      
tcp        0      0 10.114.102.1:53         0.0.0.0:*               LISTEN      
tcp        0    493 <LTE_IP>:58368    3.227.28.93:443         ESTABLISHED 
tcp        0      0 127.0.0.1:59448         127.0.0.1:48484         TIME_WAIT   
tcp        0      0 :::80                   :::*                    LISTEN      
tcp        0      0 :::8081                 :::*                    LISTEN      
tcp        0      0 :::8082                 :::*                    LISTEN      
tcp        0      0 :::8181                 :::*                    LISTEN      
tcp        0      0 :::32181                :::*                    LISTEN      
tcp        0      0 :::502                  :::*                    LISTEN      
tcp        0      0 :::1880                 :::*                    LISTEN      
tcp        0      0 :::8442                 :::*                    LISTEN      
tcp        0      0 :::8123                 :::*                    LISTEN      
tcp        0      0 :::8000                 :::*                    LISTEN      
tcp        0      0 :::48484                :::*                    LISTEN      
tcp        0      0 :::62662                :::*                    LISTEN      
tcp        0      0 :::22222                :::*                    LISTEN      
tcp        0   2080 ::ffff:10.240.2.193:22222 ::ffff:52.4.252.97:38174 ESTABLISHED 
udp        0      0 127.0.0.1:323           0.0.0.0:*                           
udp        0      0 0.0.0.0:58494           0.0.0.0:*                           
udp        0      0 0.0.0.0:5353            0.0.0.0:*                           
udp        0      0 0.0.0.0:5353            0.0.0.0:*                           
udp        0      0 0.0.0.0:51558           0.0.0.0:*                           
udp        0      0 127.0.0.2:53            0.0.0.0:*                           
udp        0      0 10.114.102.1:53         0.0.0.0:*                           
udp        0      0 ::1:323                 :::*                                
udp        0      0 :::57683                :::*                                
udp        0      0 :::5353                 :::*                                
udp        0      0 :::5353                 :::*                                
udp        0      0 :::60221                :::*     

Further more I check the disc and when comparing with other devices I’d assume /mnt/state shouldn’t be full?!

Filesystem                      Size  Used Avail Use% Mounted on
devtmpfs                        3.9G     0  3.9G   0% /dev
/dev/disk/by-label/resin-rootA  300M  294M     0 100% /mnt/sysroot/active
/dev/disk/by-label/resin-state   19M   18M     0 100% /mnt/state
none                            300M  294M     0 100% /
tmpfs                           3.9G  360K  3.9G   1% /dev/shm
tmpfs                           3.9G   18M  3.9G   1% /run
tmpfs                           3.9G     0  3.9G   0% /sys/fs/cgroup
tmpfs                           3.9G     0  3.9G   0% /tmp
/dev/mmcblk0p1                   40M  650K   40M   2% /mnt/boot
tmpfs                           3.9G  156K  3.9G   1% /var/volatile
/dev/mmcblk0p6                   57G  5.5G   49G  11% /mnt/data

Basically it looks to me like something is already occupying the port 32181 - I don’t know what and why this happens in this case though.

Apparently NetworkManager spammed the /mnt/state partition with temporary timestamp files (seems to be an old issue. I fixed that as well as some temporary aufs files filling up the root partition. I then rebooted the device. The issue however is still persisting.
Maybe something got stuck when the partitions were full. Is there any further way to reset things than a reboot?