Lost control of Balena device

After my local internet dropped during an image building on my computer, I completely lost control over my balena device on Balena Cloud. I can’t reboot, restart, stop services, and can’t see the logs (the past logs are still there but the new ones which should appear after an image upload, aren’t).

The building release hash on the computer matches that on the Balena cloud, yet.

After unplugging the device from the power source and attempting another reboot, I found this message:

Request error: tunneling socket could not be established, statusCode=500

54c6867fa9bc69f114eca422cd269edc_diagnostics_2021.05.25_18.48.24+0000.txt (352.9 KB)

Some logs above.

Hi there, what kind of device is it? Do you have the ability to ssh into the device, either via the CLI or via the terminal in the dashboard?

Hey @rcooke-warwick, it is a Raspberry Pi Zero W v1.1. Yes, I can ssh into the device and run the journal commands into the Host OS.

Hi Arthur, what version of the OS and supervisor does your device have?

Hi!

BalenaOS Host version: 2.54.2+rev1
Supervisor: 12.8.0

The supervisor seems to be having problems connecting to its own database:

2021-05-25 18:48:05.735531129+00:00
-- Logs begin at Tue 2021-05-25 18:45:07 UTC, end at Tue 2021-05-25 18:48:06 UTC. --
May 25 18:46:13 resin-supervisor[22730]: (node:1) UnhandledPromiseRejectionWarning: KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?
May 25 18:46:13 resin-supervisor[22730]:     at Client_SQLite3.acquireConnection (/usr/src/app/dist/app.js:6:267325)
May 25 18:46:13 resin-supervisor[22730]:     at runNextTicks (internal/process/task_queues.js:62:5)
May 25 18:46:13 resin-supervisor[22730]:     at listOnTimeout (internal/timers.js:518:9)
May 25 18:46:13 resin-supervisor[22730]:     at processTimers (internal/timers.js:492:7)
May 25 18:46:13 resin-supervisor[22730]: (node:1) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 3)
May 25 18:46:13 resin-supervisor[22730]: (node:1) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
May 25 18:47:17 resin-supervisor[22730]: [e[31merrore[39m]   LogBackend: unexpected error: Error: Client network socket disconnected before secure TLS connection was established
May 25 18:47:17 resin-supervisor[22730]: [e[31merrore[39m]         at connResetException (internal/errors.js:608:14)
May 25 18:47:17 resin-supervisor[22730]: [e[31merrore[39m]       at TLSSocket.onConnectEnd (_tls_wrap.js:1514:19)
May 25 18:47:17 resin-supervisor[22730]: [e[31merrore[39m]       at Object.onceWrapper (events.js:416:28)
May 25 18:47:17 resin-supervisor[22730]: [e[31merrore[39m]       at TLSSocket.emit (events.js:322:22)
May 25 18:47:17 resin-supervisor[22730]: [e[31merrore[39m]       at endReadableNT (_stream_readable.js:1187:12)
May 25 18:47:17 resin-supervisor[22730]: [e[31merrore[39m]       at processTicksAndRejections (internal/process/task_queues.js:84:21)

Hi, we got the logs above from the diagnostics file that you sent. Can you get more recent logs from the supervisor by the running the command journalctl -u resin-supervisor from the HostOS?

Sure, but look:

root@54c6867:~# journalctl -u resin-supervisor
-- Logs begin at Fri 2021-05-28 13:50:07 UTC, end at Fri 2021-05-28 13:51:59 UTC. --
-- No entries --
root@54c6867:~#

Hi, can you try restarting the supervisor using the following command from the HostOS: systemctl restart resin-supervisor. Let’s see if we still get some errors from the supervisor after the restart.

Nice, thank you for the support! Now I got this:

May 28 14:06:25 54c6867 resin-supervisor[24896]: [info]    Supervisor v12.8.0 starting up...
May 28 14:06:34 54c6867 resin-supervisor[24896]: [info]    Setting host to discoverable
May 28 14:06:34 54c6867 resin-supervisor[24896]: [warn]    Invalid firewall mode: . Reverting to state: off
May 28 14:06:34 54c6867 resin-supervisor[24896]: [info]    <F0><9F><94><A5> Applying firewall mode: off
May 28 14:06:36 54c6867 resin-supervisor[24896]: [debug]   Starting logging infrastructure
May 28 14:06:38 54c6867 resin-supervisor[24896]: [info]    Starting firewall
May 28 14:06:38 54c6867 resin-supervisor[24896]: [debug]   Performing database cleanup for container log timestamps
May 28 14:06:41 54c6867 resin-supervisor[24896]: [info]    Previous engine snapshot was not stored. Skipping cleanup.
May 28 14:06:41 54c6867 resin-supervisor[24896]: [debug]   Handling of local mode switch is completed
May 28 14:06:42 54c6867 resin-supervisor[24896]: [success] <F0><9F><94><A5> Firewall mode applied
May 28 14:06:42 54c6867 resin-supervisor[24896]: [debug]   Starting api binder
May 28 14:06:43 54c6867 resin-supervisor[24896]: (node:1) [DEP0005] DeprecationWarning: Buffer() is deprecated due to security and usability issues. Please use the Buffer.alloc(), Buffer.allocUnsafe(), or Buffer.from() methods instead.
May 28 14:06:44 54c6867 resin-supervisor[24896]: [info]    API Binder bound to: https://api.balena-cloud.com/v6/
May 28 14:06:44 54c6867 resin-supervisor[24896]: [event]   Event: Supervisor start {}
May 28 14:06:45 54c6867 resin-supervisor[24896]: [debug]   Spawning journald with: chroot  /mnt/root journalctl -a -S 2021-05-28 13:45:22 -o json CONTAINER_ID_FULL=9e32a7509a27ef64e4dda303e44aeec0e1b2ba452b2d4b961588b57e87d03c54
May 28 14:07:51 54c6867 resin-supervisor[24896]: (node:1) UnhandledPromiseRejectionWarning: KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?
May 28 14:07:51 54c6867 resin-supervisor[24896]:     at Client_SQLite3.acquireConnection (/usr/src/app/dist/app.js:6:267325)
May 28 14:07:51 54c6867 resin-supervisor[24896]:     at runNextTicks (internal/process/task_queues
: off
May 28 14:06:36 54c6867 resin-supervisor[24896]: [debug]   Starting logging infrastructure
May 28 14:06:38 54c6867 resin-supervisor[24896]: [info]    Starting firewall
May 28 14:06:38 54c6867 resin-supervisor[24896]: [debug]   Performing database cleanup for contain
er log timestamps
May 28 14:06:41 54c6867 resin-supervisor[24896]: [info]    Previous engine snapshot was not stored
. Skipping cleanup.
May 28 14:06:41 54c6867 resin-supervisor[24896]: [debug]   Handling of local mode switch is comple
ted
May 28 14:06:42 54c6867 resin-supervisor[24896]: [success] <F0><9F><94><A5> Firewall mode applied
May 28 14:06:42 54c6867 resin-supervisor[24896]: [debug]   Starting api binder
May 28 14:06:45 54c6867 resin-supervisor[24896]: [debug]   Spawning journald with: chroot  /mnt/ro
ot journalctl -a -S 2021-05-28 13:45:22 -o json CONTAINER_ID_FULL=9e32a7509a27ef64e4dda303e44aeec0e1b2ba452b2d4b961588b57e87d03c54
May 28 14:07:51 54c6867 resin-supervisor[24896]: (node:1) UnhandledPromiseRejectionWarning: KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?
May 28 14:07:51 54c6867 resin-supervisor[24896]:     at Client_SQLite3.acquireConnection (/usr/src/app/dist/app.js:6:267325)
May 28 14:07:51 54c6867 resin-supervisor[24896]:     at runNextTicks (internal/process/task_queues
May 28 14:07:51 54c6867 resin-supervisor[24896]:     at runNextTicks (internal/process/task_queues.js:62:5)
May 28 14:07:51 54c6867 resin-supervisor[24896]:     at listOnTimeout (internal/timers.js:518:9)
May 28 14:07:51 54c6867 resin-supervisor[24896]:     at processTimers (internal/timers.js:492:7)
May 28 14:07:51 54c6867 resin-supervisor[24896]: (node:1) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 3)
May 28 14:07:51 54c6867 resin-supervisor[24896]: (node:1) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

I don’t know, but I think I have some culprits in mind:
Maybe:

  • A never ending superfast logging a single pin state in my container application
  • My router do not allowing 443 port opening (Maybe a reason for VPN Only error?)

Hi Arthur,

Thanks for sharing those logs. We have observed the error Knex: Timeout acquiring a connection. The pool is probably full with devices doing very fast logging on a container, which is consistent with your hypothesis 1.

Could you try pushing a new release with reduced logging on your container to see if that is the culprit here too?

Thanks