We have a multi-container setup that is running fine on a lot of devices. On one device one container in the status “Installed” and the containers which depend on that one in “Downloaded” (2 other containers are running fine). I’m not exactly sure when this happened but the device is out in the field connected via LTE. The device is running on version balenaOS 2.45.1+rev4 of the UpBoard Image. A restart of the device does not solve the issue.
When checking the logs, I get this output:
Sep 15 07:34:10 af1a482 12d07c71a571[782]: [info] Applying target state
Sep 15 07:34:10 af1a482 12d07c71a571[782]: [warn] Ignoring unsupported or unknown compose fields: links
Sep 15 07:34:10 af1a482 12d07c71a571[782]: [warn] Ignoring unsupported or unknown compose fields: containerName
Sep 15 07:34:10 af1a482 12d07c71a571[782]: [warn] Ignoring unsupported or unknown compose fields: containerName
Sep 15 07:34:10 af1a482 12d07c71a571[782]: [warn] Ignoring unsupported or unknown compose fields: links
Sep 15 07:34:10 af1a482 12d07c71a571[782]: [warn] Ignoring unsupported or unknown compose fields: containerName, links
Sep 15 07:34:10 af1a482 12d07c71a571[782]: [warn] Ignoring unsupported or unknown compose fields: containerName
Sep 15 07:34:10 af1a482 12d07c71a571[782]: [event] Event: Service start {"service":{"appId":1762257,"serviceId":777636,"serviceName":"zookeeper","releaseId":1593546}}
Sep 15 07:34:10 af1a482 balenad[782]: time="2021-09-15T07:34:10.661781481Z" level=warning msg="Failed to allocate and map port 32181-32181: Bind for 0.0.0.0:32181 failed: port is already allocated"
Sep 15 07:34:10 af1a482 balenad[782]: time="2021-09-15T07:34:10.728397992Z" level=error msg="6fcd28581c4039af460a3fc831ae13b5f02e457469f5167cdc9b6c8b61df3336 cleanup: failed to delete container from containerd: no such container"
Sep 15 07:34:10 af1a482 balenad[782]: time="2021-09-15T07:34:10.728500331Z" level=error msg="Handler for POST /containers/6fcd28581c4039af460a3fc831ae13b5f02e457469f5167cdc9b6c8b61df3336/start returned error: driver failed programming external connectivity on endpoint zookeeper_2931171_1593546 (a10cb57b0d7db21c5b1b2a9d2cd22c885458ae5c5ecea3c46cc6a8faf8300b79): Bind for 0.0.0.0:32181 failed: port is already allocated"
Sep 15 07:34:10 af1a482 12d07c71a571[782]: [error] Scheduling another update attempt in 900000ms due to failure: Error: (HTTP code 500) server error - driver failed programming external connectivity on endpoint zookeeper_2931171_1593546 (a10cb57b0d7db21c5b1b2a9d2cd22c885458ae5c5ecea3c46cc6a8faf8300b79): Bind for 0.0.0.0:32181 failed: port is already allocated
Sep 15 07:34:10 af1a482 12d07c71a571[782]: [error] at /usr/src/app/dist/app.js:596:108431
Sep 15 07:34:10 af1a482 12d07c71a571[782]: [error] at /usr/src/app/dist/app.js:596:108394
Sep 15 07:34:10 af1a482 12d07c71a571[782]: [error] at m.buildPayload (/usr/src/app/dist/app.js:596:108404)
Sep 15 07:34:10 af1a482 12d07c71a571[782]: [error] at IncomingMessage.<anonymous> (/usr/src/app/dist/app.js:596:107904)
Sep 15 07:34:10 af1a482 12d07c71a571[782]: [error] at IncomingMessage.emit (events.js:194:15)
Sep 15 07:34:10 af1a482 12d07c71a571[782]: [error] at endReadableNT (_stream_readable.js:1125:12)
Sep 15 07:34:10 af1a482 12d07c71a571[782]: [error] at process._tickCallback (internal/process/next_tick.js:63:19)
Sep 15 07:34:10 af1a482 12d07c71a571[782]: [error] Device state apply error Error: (HTTP code 500) server error - driver failed programming external connectivity on endpoint zookeeper_2931171_1593546 (a10cb57b0d7db21c5b1b2a9d2cd22c885458ae5c5ecea3c46cc6a8faf8300b79): Bind for 0.0.0.0:32181 failed: port is already allocated
Sep 15 07:34:10 af1a482 12d07c71a571[782]: [error] at /usr/src/app/dist/app.js:596:108431
Sep 15 07:34:10 af1a482 12d07c71a571[782]: [error] at /usr/src/app/dist/app.js:596:108394
Sep 15 07:34:10 af1a482 12d07c71a571[782]: [error] at m.buildPayload (/usr/src/app/dist/app.js:596:108404)
Sep 15 07:34:10 af1a482 12d07c71a571[782]: [error] at IncomingMessage.<anonymous> (/usr/src/app/dist/app.js:596:107904)
Sep 15 07:34:10 af1a482 12d07c71a571[782]: [error] at IncomingMessage.emit (events.js:194:15)
Sep 15 07:34:10 af1a482 12d07c71a571[782]: [error] at endReadableNT (_stream_readable.js:1125:12)
Sep 15 07:34:10 af1a482 12d07c71a571[782]: [error] at process._tickCallback (internal/process/next_tick.js:63:19)
Sep 15 07:34:57 af1a482 balenad[782]: time="2021-09-15T07:34:57.626176324Z" level=info msg="shim balena-engine-containerd-shim started" address=/containerd-shim/moby/303b4949d34d377e38ff949db9b26ca3fa2f62a9b69221a28f641b8742fa22c9/shim.sock debug=false pid=27925
Sep 15 07:34:58 af1a482 balenad[782]: time="2021-09-15T07:34:58.687460213Z" level=info msg="shim reaped" id=303b4949d34d377e38ff949db9b26ca3fa2f62a9b69221a28f641b8742fa22c9
Sep 15 07:34:58 af1a482 balenad[782]: time="2021-09-15T07:34:58.695717331Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Sep 15 07:35:09 af1a482 deeb61aacf0e[782]: waiting
Sep 15 07:37:09 af1a482 deeb61aacf0e[782]: waiting
Sep 15 07:37:59 af1a482 balenad[782]: time="2021-09-15T07:37:59.475858027Z" level=info msg="shim balena-engine-containerd-shim started" address=/containerd-shim/moby/ae60a16271ee5bfb59d3ed5e7d3cee1cf322c4cb15fda772bb254393f60645e5/shim.sock debug=false pid=28072
Sep 15 07:38:00 af1a482 balenad[782]: time="2021-09-15T07:38:00.534786559Z" level=info msg="shim reaped" id=ae60a16271ee5bfb59d3ed5e7d3cee1cf322c4cb15fda772bb254393f60645e5
Sep 15 07:38:00 af1a482 balenad[782]: time="2021-09-15T07:38:00.543598115Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Sep 15 07:38:08 af1a482 12d07c71a571[782]: [api] GET /v1/healthy 200 - 2.649 ms
Sep 15 07:39:09 af1a482 deeb61aacf0e[782]: waiting
Sep 15 07:41:01 af1a482 balenad[782]: time="2021-09-15T07:41:01.318956737Z" level=info msg="shim balena-engine-containerd-shim started" address=/containerd-shim/moby/534a250b8c9f0361f3e72e147a77c26633897ae4b95e6aed6ccbf5dfad023e53/shim.sock debug=false pid=28277
Sep 15 07:41:02 af1a482 balenad[782]: time="2021-09-15T07:41:02.387840307Z" level=info msg="shim reaped" id=534a250b8c9f0361f3e72e147a77c26633897ae4b95e6aed6ccbf5dfad023e53
Sep 15 07:41:02 af1a482 balenad[782]: time="2021-09-15T07:41:02.396218195Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Sep 15 07:41:09 af1a482 deeb61aacf0e[782]: waiting
Sep 15 07:41:36 af1a482 12d07c71a571[782]: [debug] Attempting container log timestamp flush...
Sep 15 07:41:36 af1a482 12d07c71a571[782]: [debug] Container log timestamp flush complete
In another thread I found the advice to run netstat -lauten
which returns
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 127.0.0.2:53 0.0.0.0:* LISTEN
tcp 0 0 10.114.102.1:53 0.0.0.0:* LISTEN
tcp 0 493 <LTE_IP>:58368 3.227.28.93:443 ESTABLISHED
tcp 0 0 127.0.0.1:59448 127.0.0.1:48484 TIME_WAIT
tcp 0 0 :::80 :::* LISTEN
tcp 0 0 :::8081 :::* LISTEN
tcp 0 0 :::8082 :::* LISTEN
tcp 0 0 :::8181 :::* LISTEN
tcp 0 0 :::32181 :::* LISTEN
tcp 0 0 :::502 :::* LISTEN
tcp 0 0 :::1880 :::* LISTEN
tcp 0 0 :::8442 :::* LISTEN
tcp 0 0 :::8123 :::* LISTEN
tcp 0 0 :::8000 :::* LISTEN
tcp 0 0 :::48484 :::* LISTEN
tcp 0 0 :::62662 :::* LISTEN
tcp 0 0 :::22222 :::* LISTEN
tcp 0 2080 ::ffff:10.240.2.193:22222 ::ffff:52.4.252.97:38174 ESTABLISHED
udp 0 0 127.0.0.1:323 0.0.0.0:*
udp 0 0 0.0.0.0:58494 0.0.0.0:*
udp 0 0 0.0.0.0:5353 0.0.0.0:*
udp 0 0 0.0.0.0:5353 0.0.0.0:*
udp 0 0 0.0.0.0:51558 0.0.0.0:*
udp 0 0 127.0.0.2:53 0.0.0.0:*
udp 0 0 10.114.102.1:53 0.0.0.0:*
udp 0 0 ::1:323 :::*
udp 0 0 :::57683 :::*
udp 0 0 :::5353 :::*
udp 0 0 :::5353 :::*
udp 0 0 :::60221 :::*
Further more I check the disc and when comparing with other devices I’d assume /mnt/state shouldn’t be full?!
Filesystem Size Used Avail Use% Mounted on
devtmpfs 3.9G 0 3.9G 0% /dev
/dev/disk/by-label/resin-rootA 300M 294M 0 100% /mnt/sysroot/active
/dev/disk/by-label/resin-state 19M 18M 0 100% /mnt/state
none 300M 294M 0 100% /
tmpfs 3.9G 360K 3.9G 1% /dev/shm
tmpfs 3.9G 18M 3.9G 1% /run
tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup
tmpfs 3.9G 0 3.9G 0% /tmp
/dev/mmcblk0p1 40M 650K 40M 2% /mnt/boot
tmpfs 3.9G 156K 3.9G 1% /var/volatile
/dev/mmcblk0p6 57G 5.5G 49G 11% /mnt/data
Basically it looks to me like something is already occupying the port 32181 - I don’t know what and why this happens in this case though.