Lost control of Balena device

arthurborges · May 23, 2021, 2:32am

After my local internet dropped during an image building on my computer, I completely lost control over my balena device on Balena Cloud. I can’t reboot, restart, stop services, and can’t see the logs (the past logs are still there but the new ones which should appear after an image upload, aren’t).

The building release hash on the computer matches that on the Balena cloud, yet.

arthurborges · May 23, 2021, 2:35am

After unplugging the device from the power source and attempting another reboot, I found this message:

Request error: tunneling socket could not be established, statusCode=500

arthurborges · May 25, 2021, 7:28pm

54c6867fa9bc69f114eca422cd269edc_diagnostics_2021.05.25_18.48.24+0000.txt (352.9 KB)

Some logs above.

rcooke-warwick · May 26, 2021, 1:05pm

Hi there, what kind of device is it? Do you have the ability to ssh into the device, either via the CLI or via the terminal in the dashboard?

arthurborges · May 26, 2021, 2:06pm

Hey @rcooke-warwick, it is a Raspberry Pi Zero W v1.1. Yes, I can ssh into the device and run the journal commands into the Host OS.

rcooke-warwick · May 26, 2021, 2:14pm

Hi Arthur, what version of the OS and supervisor does your device have?

arthurborges · May 26, 2021, 2:20pm

Hi!

BalenaOS Host version: 2.54.2+rev1
Supervisor: 12.8.0

cmfcruz · May 28, 2021, 9:51am

The supervisor seems to be having problems connecting to its own database:

2021-05-25 18:48:05.735531129+00:00
-- Logs begin at Tue 2021-05-25 18:45:07 UTC, end at Tue 2021-05-25 18:48:06 UTC. --
May 25 18:46:13 resin-supervisor[22730]: (node:1) UnhandledPromiseRejectionWarning: KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?
May 25 18:46:13 resin-supervisor[22730]:     at Client_SQLite3.acquireConnection (/usr/src/app/dist/app.js:6:267325)
May 25 18:46:13 resin-supervisor[22730]:     at runNextTicks (internal/process/task_queues.js:62:5)
May 25 18:46:13 resin-supervisor[22730]:     at listOnTimeout (internal/timers.js:518:9)
May 25 18:46:13 resin-supervisor[22730]:     at processTimers (internal/timers.js:492:7)
May 25 18:46:13 resin-supervisor[22730]: (node:1) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 3)
May 25 18:46:13 resin-supervisor[22730]: (node:1) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
May 25 18:47:17 resin-supervisor[22730]: [e[31merrore[39m]   LogBackend: unexpected error: Error: Client network socket disconnected before secure TLS connection was established
May 25 18:47:17 resin-supervisor[22730]: [e[31merrore[39m]         at connResetException (internal/errors.js:608:14)
May 25 18:47:17 resin-supervisor[22730]: [e[31merrore[39m]       at TLSSocket.onConnectEnd (_tls_wrap.js:1514:19)
May 25 18:47:17 resin-supervisor[22730]: [e[31merrore[39m]       at Object.onceWrapper (events.js:416:28)
May 25 18:47:17 resin-supervisor[22730]: [e[31merrore[39m]       at TLSSocket.emit (events.js:322:22)
May 25 18:47:17 resin-supervisor[22730]: [e[31merrore[39m]       at endReadableNT (_stream_readable.js:1187:12)
May 25 18:47:17 resin-supervisor[22730]: [e[31merrore[39m]       at processTicksAndRejections (internal/process/task_queues.js:84:21)

cmfcruz · May 28, 2021, 9:57am

Hi, we got the logs above from the diagnostics file that you sent. Can you get more recent logs from the supervisor by the running the command journalctl -u resin-supervisor from the HostOS?

arthurborges · May 28, 2021, 1:53pm

Sure, but look:

root@54c6867:~# journalctl -u resin-supervisor
-- Logs begin at Fri 2021-05-28 13:50:07 UTC, end at Fri 2021-05-28 13:51:59 UTC. --
-- No entries --
root@54c6867:~#

cmfcruz · May 28, 2021, 2:02pm

Hi, can you try restarting the supervisor using the following command from the HostOS: systemctl restart resin-supervisor. Let’s see if we still get some errors from the supervisor after the restart.

arthurborges · May 28, 2021, 2:14pm

Nice, thank you for the support! Now I got this:

May 28 14:06:25 54c6867 resin-supervisor[24896]: [info]    Supervisor v12.8.0 starting up...
May 28 14:06:34 54c6867 resin-supervisor[24896]: [info]    Setting host to discoverable
May 28 14:06:34 54c6867 resin-supervisor[24896]: [warn]    Invalid firewall mode: . Reverting to state: off
May 28 14:06:34 54c6867 resin-supervisor[24896]: [info]    <F0><9F><94><A5> Applying firewall mode: off
May 28 14:06:36 54c6867 resin-supervisor[24896]: [debug]   Starting logging infrastructure
May 28 14:06:38 54c6867 resin-supervisor[24896]: [info]    Starting firewall
May 28 14:06:38 54c6867 resin-supervisor[24896]: [debug]   Performing database cleanup for container log timestamps
May 28 14:06:41 54c6867 resin-supervisor[24896]: [info]    Previous engine snapshot was not stored. Skipping cleanup.
May 28 14:06:41 54c6867 resin-supervisor[24896]: [debug]   Handling of local mode switch is completed
May 28 14:06:42 54c6867 resin-supervisor[24896]: [success] <F0><9F><94><A5> Firewall mode applied
May 28 14:06:42 54c6867 resin-supervisor[24896]: [debug]   Starting api binder
May 28 14:06:43 54c6867 resin-supervisor[24896]: (node:1) [DEP0005] DeprecationWarning: Buffer() is deprecated due to security and usability issues. Please use the Buffer.alloc(), Buffer.allocUnsafe(), or Buffer.from() methods instead.
May 28 14:06:44 54c6867 resin-supervisor[24896]: [info]    API Binder bound to: https://api.balena-cloud.com/v6/
May 28 14:06:44 54c6867 resin-supervisor[24896]: [event]   Event: Supervisor start {}
May 28 14:06:45 54c6867 resin-supervisor[24896]: [debug]   Spawning journald with: chroot  /mnt/root journalctl -a -S 2021-05-28 13:45:22 -o json CONTAINER_ID_FULL=9e32a7509a27ef64e4dda303e44aeec0e1b2ba452b2d4b961588b57e87d03c54
May 28 14:07:51 54c6867 resin-supervisor[24896]: (node:1) UnhandledPromiseRejectionWarning: KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?
May 28 14:07:51 54c6867 resin-supervisor[24896]:     at Client_SQLite3.acquireConnection (/usr/src/app/dist/app.js:6:267325)
May 28 14:07:51 54c6867 resin-supervisor[24896]:     at runNextTicks (internal/process/task_queues
: off
May 28 14:06:36 54c6867 resin-supervisor[24896]: [debug]   Starting logging infrastructure
May 28 14:06:38 54c6867 resin-supervisor[24896]: [info]    Starting firewall
May 28 14:06:38 54c6867 resin-supervisor[24896]: [debug]   Performing database cleanup for contain
er log timestamps
May 28 14:06:41 54c6867 resin-supervisor[24896]: [info]    Previous engine snapshot was not stored
. Skipping cleanup.
May 28 14:06:41 54c6867 resin-supervisor[24896]: [debug]   Handling of local mode switch is comple
ted
May 28 14:06:42 54c6867 resin-supervisor[24896]: [success] <F0><9F><94><A5> Firewall mode applied
May 28 14:06:42 54c6867 resin-supervisor[24896]: [debug]   Starting api binder
May 28 14:06:45 54c6867 resin-supervisor[24896]: [debug]   Spawning journald with: chroot  /mnt/ro
ot journalctl -a -S 2021-05-28 13:45:22 -o json CONTAINER_ID_FULL=9e32a7509a27ef64e4dda303e44aeec0e1b2ba452b2d4b961588b57e87d03c54
May 28 14:07:51 54c6867 resin-supervisor[24896]: (node:1) UnhandledPromiseRejectionWarning: KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?
May 28 14:07:51 54c6867 resin-supervisor[24896]:     at Client_SQLite3.acquireConnection (/usr/src/app/dist/app.js:6:267325)
May 28 14:07:51 54c6867 resin-supervisor[24896]:     at runNextTicks (internal/process/task_queues
May 28 14:07:51 54c6867 resin-supervisor[24896]:     at runNextTicks (internal/process/task_queues.js:62:5)
May 28 14:07:51 54c6867 resin-supervisor[24896]:     at listOnTimeout (internal/timers.js:518:9)
May 28 14:07:51 54c6867 resin-supervisor[24896]:     at processTimers (internal/timers.js:492:7)
May 28 14:07:51 54c6867 resin-supervisor[24896]: (node:1) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 3)
May 28 14:07:51 54c6867 resin-supervisor[24896]: (node:1) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

I don’t know, but I think I have some culprits in mind:
Maybe:

A never ending superfast logging a single pin state in my container application
My router do not allowing 443 port opening (Maybe a reason for VPN Only error?)

pipex · June 7, 2021, 11:29pm

Hi Arthur,

Thanks for sharing those logs. We have observed the error Knex: Timeout acquiring a connection. The pool is probably full with devices doing very fast logging on a container, which is consistent with your hypothesis 1.

Could you try pushing a new release with reduced logging on your container to see if that is the culprit here too?

Thanks

Topic		Replies	Views
DeprecationWarning: OutgoingMessage.prototype._headers is deprecated balenaOS	15	5821	November 18, 2019
Balena OS supervisor takes too much time to start balenaOS	8	1241	June 20, 2019
balenaFin stuck balenaFin	26	803	January 20, 2020
Error: Request error: tunneling socket could not be established, cause=socket hang up Product support	12	1971	February 14, 2020
Device services stuck at "stopping" state Product support	10	2017	May 8, 2019

Lost control of Balena device

Related topics