supervisor copntainer suddenly stopped

I have a fleet of devices running balenaOS 2.46.1+rev1 (managed by openBalena). The supervisor container stopped working suddenly for one of my devices, it is offline all the time but the services keep running normally (i can tell because they constantly send data), I accessed the device on the local network through ssh and extracted some supervisor logs… the same warnings and errors keep apearing:

[info] Supervisor v10.6.27 starting up…
(node:1) [DEP0005] DeprecationWarning: Buffer() is deprecated due to security and usability issues. Please use the Buffer.alloc(), Buffer.allocUnsafe(), or Buffer.from() methods instead.
[debug] Starting event tracker
[debug] Starting api binder
[debug] Starting logging infrastructure
[event] Event: Supervisor start {}
[debug] Performing database cleanup for container log timestamps
[info] Previous engine snapshot was not stored. Skipping cleanup.
[debug] Handling of local mode switch is completed
[debug] Connectivity check enabled: true
[debug] Starting periodic check for IP addresses
[info] Reporting initial state, supervisor version and API info
[debug] VPN status path exists.
[info] Waiting for connectivity…
[debug] Skipping preloading
[info] Starting API server
[info] Applying target state
[error] LogBackend: server responded with status code: 404
[debug] Ensuring device is provisioned
[debug] Starting current state report
[debug] Supervisor API listening on allowed interfaces only
[debug] Finished applying target state
[success] Device state apply success
(node:1) UnhandledPromiseRejectionWarning: t
at /usr/src/app/dist/app.js:484:8806
at c (/usr/src/app/dist/app.js:9:77523)
at O._settlePromiseFromHandler (/usr/src/app/dist/app.js:312:325441)
at O._settlePromise (/usr/src/app/dist/app.js:312:326241)
at O._settlePromise0 (/usr/src/app/dist/app.js:312:326940)
at O._settlePromises (/usr/src/app/dist/app.js:312:328292)
at O._fulfill (/usr/src/app/dist/app.js:312:327310)
at q._callback (/usr/src/app/dist/app.js:122:15684)
at q.t._callback.t.callback.t.callback (/usr/src/app/dist/app.js:544:5327)
at q.emit (events.js:189:13)
at q. (/usr/src/app/dist/app.js:544:18140)
at q.emit (events.js:189:13)
at IncomingMessage. (/usr/src/app/dist/app.js:544:16982)
at Object.onceWrapper (events.js:277:13)
at IncomingMessage.emit (events.js:194:15)
at endReadableNT (_stream_readable.js:1125:12)
at process._tickCallback (internal/process/next_tick.js:63:19)
(node:1) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
(node:1) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
[error] LogBackend: server responded with status code: 404
[error] LogBackend: server responded with status code: 404
[error] LogBackend: server responded with status code: 404
[error] LogBackend: server responded with status code: 404
[error] LogBackend: server responded with status code: 404
[error] LogBackend: server responded with status code: 404
[error] LogBackend: server responded with status code: 404
[error] LogBackend: server responded with status code: 404
[error] LogBackend: server responded with status code: 404
[error] LogBackend: server responded with status code: 404
[error] LogBackend: server responded with status code: 404
[error] LogBackend: server responded with status code: 404
[error] LogBackend: server responded with status code: 404
[error] LogBackend: server responded with status code: 404
[error] LogBackend: server responded with status code: 404
[error] LogBackend: server responded with status code: 404
[error] LogBackend: server responded with status code: 404
[api] GET /v1/healthy 200 - 30.605 ms

Hey, supervisor maintainer here.

I’m actually currently working through the codebase to try and eliminate all instances of unhandled rejections. There mostly gone, but there’s still a couple missing error handling coverage, and I think that this is what you’re seeing.

Unfortunately, in this case I’m not sure what exactly is going wrong (and in fact, I’ve never seen the

[error] LogBackend: server responded with status code: 404

error).

Could you try restarting the supervisor please?

systemctl restart resin-supervisor

hi, thanks for your help.

I extracted logs and replaced the device sd card with a fresh balenaOS, maybe i can still make some tests using the old sd card and another device.

but beyond that… how can i restart the supervisor container remotely if the device is showing offline because is not connected to the vpn and is not communicating with the openbalena server?

if the supervisor fails like this, the only way of fixing it is to get access to the device local network right?

You can access the device through SSH locally - please check the documentation on that and particularly about adding an SSH key for prod images.

Thanks,
Zahari

Forgot to post the link: https://www.balena.io/docs/learn/manage/ssh-access/

I’ve been facing the same… started happening on my nuc. I tried your supervisor restart

systemctl restart resin-supervisor

But no dice… here’s the logs since the restart - ultimately comes back to the 404

Sep 11 02:50:47 99e4e1a resin-supervisor[11058]: [debug]   Starting current state report
Sep 11 02:50:47 99e4e1a resin-supervisor[11058]: [debug]   Starting target state poll
Sep 11 02:50:47 99e4e1a resin-supervisor[11058]: [debug]   Supervisor API listening on allowed interfaces only
Sep 11 02:50:47 99e4e1a resin-supervisor[11058]: [info]    Supervisor API successfully started on port 48484
Sep 11 02:50:47 99e4e1a resin-supervisor[11058]: [event]   Event: Service start {"service":{"appId":1393389,"serviceId":214629,"serviceName":"config","releaseId":1216436}}
Sep 11 02:50:47 99e4e1a resin-supervisor[11058]: [error]   Failed to get target state for device: StatusError
Sep 11 02:50:47 99e4e1a resin-supervisor[11058]: [event]   Event: Service started {"service":{"appId":1393389,"serviceId":214629,"serviceName":"config","releaseId":1216436}}
Sep 11 02:50:48 99e4e1a resin-supervisor[11058]: [debug]   Finished applying target state
Sep 11 02:50:48 99e4e1a resin-supervisor[11058]: [success] Device state apply success
Sep 11 02:50:48 99e4e1a resin-supervisor[11058]: [event]   Event: Service exit {"service":{"appId":1393389,"serviceId":214629,"serviceName":"config","releaseId":1216436}}
Sep 11 02:50:52 99e4e1a resin-supervisor[11058]: [error]   LogBackend: server responded with status code: 404