Online (VPN only) with diagnotics saying check_supervisor

Hi,

We’ve built a project based on a Raspberry Pi 4 (64bit). We’ve a number of devices in the field running this fine (and have been for months), but one device is stuck with Online (VPN only) and the following from diagnostics:

I’ve run the diagnostics a number of times and the result is always the same, over multiple days with power cycles between them.

The application is written in Python, accepts data from an iBeacon-like device and every 15 minutes reports this back to base. This device is unable to find the bluetooth iBeacon device (but a mobile phone nearby can pick it up ok).

The device is with a customer at the moment, so we can’t access it ourselves. We have since had them order a new replacement Raspberry Pi and SD card and this has also not resolved the problem. The SD card has been reflashed multiple times too (with the hope of this resolving the problem).

Do you have any suggestions of what steps we can try next?

Thanks,

Luke

Hi :wave: @violuke were you able to connect to this device or the previous device via the dashboard? Just asking in order to be sure that the Online (VPN only) message is true.

Also as extra info, which version of the OS and supervisor the device was/is running? The most helpful thing to see are the supervisor’s logs. But you wouldn’t be able to get them if the device is unreachable.

Hi @mbalamat ,

Thanks for your help. Yes, I can connect to the device and here’s some more info.

I opened a terminal session as an example… and in fact in testing this might have made some progress… is the device out of disk space? This is a 16GB SD card, with an image from the balena API (with WiFi details baked in), burnt using Etcher.

Please also find attached here the logs from Diagnostics > Device Diagnostics, is this what you wanted?

If not, let me know how to get what you need, and I’ll send them over.

Thanks for your help.

Luke

It’s good that there is a way to reach the device and your device is ok WRT space. You can do journalctl -au resin-supervisor --no-pager in the host in order to get the supervisor’s logs.

This might shed some light into what’s going on. Are there any errors?

Hi, yes lots!

It scrolled through loads of lines of logs, so I can’t get it all, but it looks like it’s repeating the following snippet over and over again:

Jan 15 12:50:52 1b45699 resin-supervisor[27178]: [debug]   Attempting container log timestamp flush...
Jan 15 12:50:52 1b45699 resin-supervisor[27178]: [debug]   Container log timestamp flush complete
Jan 15 12:56:03 1b45699 systemd[1]: resin-supervisor.service: Main process exited, code=exited, status=137/n/a
Jan 15 12:56:03 1b45699 systemd[1]: resin-supervisor.service: Failed with result 'exit-code'.
Jan 15 12:56:25 1b45699 resin-supervisor[28498]: resin_supervisor
Jan 15 12:56:25 1b45699 resin-supervisor[28562]: active
Jan 15 12:56:26 1b45699 resin-supervisor[28563]: Container config has not changed
Jan 15 12:56:28 1b45699 resin-supervisor[28563]: [info]    Supervisor v11.14.0 starting up...
Jan 15 12:56:28 1b45699 resin-supervisor[28563]: (node:1) UnhandledPromiseRejectionWarning: Error: The migration directory is corrupt, the following files are missing: M00005.js, M00006.js
Jan 15 12:56:28 1b45699 resin-supervisor[28563]:     at validateMigrationList (/usr/src/app/dist/app.js:6:1129303)
Jan 15 12:56:28 1b45699 resin-supervisor[28563]:     at /usr/src/app/dist/app.js:6:1130462
Jan 15 12:56:28 1b45699 resin-supervisor[28563]: (node:1) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 1)
Jan 15 12:56:28 1b45699 resin-supervisor[28563]: (node:1) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
Jan 15 12:56:28 1b45699 resin-supervisor[28563]: (node:1) UnhandledPromiseRejectionWarning: Error: The migration directory is corrupt, the following files are missing: M00005.js, M00006.js
Jan 15 12:56:28 1b45699 resin-supervisor[28563]:     at validateMigrationList (/usr/src/app/dist/app.js:6:1129303)
Jan 15 12:56:28 1b45699 resin-supervisor[28563]:     at /usr/src/app/dist/app.js:6:1130462
Jan 15 12:56:28 1b45699 resin-supervisor[28563]: (node:1) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 2)
Jan 15 12:56:28 1b45699 resin-supervisor[28563]: (node:1) UnhandledPromiseRejectionWarning: Error: The migration directory is corrupt, the following files are missing: M00005.js, M00006.js
Jan 15 12:56:28 1b45699 resin-supervisor[28563]:     at validateMigrationList (/usr/src/app/dist/app.js:6:1129303)
Jan 15 12:56:28 1b45699 resin-supervisor[28563]:     at /usr/src/app/dist/app.js:6:1130462
Jan 15 12:56:28 1b45699 resin-supervisor[28563]: (node:1) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 3)
Jan 15 12:56:28 1b45699 resin-supervisor[28563]: (node:1) UnhandledPromiseRejectionWarning: Error: The migration directory is corrupt, the following files are missing: M00005.js, M00006.js
Jan 15 12:56:28 1b45699 resin-supervisor[28563]:     at validateMigrationList (/usr/src/app/dist/app.js:6:1129303)
Jan 15 12:56:28 1b45699 resin-supervisor[28563]:     at /usr/src/app/dist/app.js:6:1130462
Jan 15 12:56:28 1b45699 resin-supervisor[28563]: (node:1) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 4)
Jan 15 12:56:28 1b45699 resin-supervisor[28563]: (node:1) UnhandledPromiseRejectionWarning: Error: The migration directory is corrupt, the following files are missing: M00005.js, M00006.js
Jan 15 12:56:28 1b45699 resin-supervisor[28563]:     at validateMigrationList (/usr/src/app/dist/app.js:6:1129303)
Jan 15 12:56:28 1b45699 resin-supervisor[28563]:     at /usr/src/app/dist/app.js:6:1130462
Jan 15 12:56:28 1b45699 resin-supervisor[28563]: (node:1) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 5)
Jan 15 12:56:28 1b45699 resin-supervisor[28563]: (node:1) UnhandledPromiseRejectionWarning: Error: The migration directory is corrupt, the following files are missing: M00005.js, M00006.js
Jan 15 12:56:28 1b45699 resin-supervisor[28563]:     at validateMigrationList (/usr/src/app/dist/app.js:6:1129303)
Jan 15 12:56:28 1b45699 resin-supervisor[28563]:     at /usr/src/app/dist/app.js:6:1130462
Jan 15 12:56:28 1b45699 resin-supervisor[28563]: (node:1) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 6)
Jan 15 12:56:28 1b45699 resin-supervisor[28563]: (node:1) UnhandledPromiseRejectionWarning: Error: The migration directory is corrupt, the following files are missing: M00005.js, M00006.js
Jan 15 12:56:28 1b45699 resin-supervisor[28563]:     at validateMigrationList (/usr/src/app/dist/app.js:6:1129303)
Jan 15 12:56:28 1b45699 resin-supervisor[28563]:     at /usr/src/app/dist/app.js:6:1130462
Jan 15 12:56:28 1b45699 resin-supervisor[28563]: (node:1) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 7)
Jan 15 12:56:28 1b45699 resin-supervisor[28563]: (node:1) UnhandledPromiseRejectionWarning: Error: The migration directory is corrupt, the following files are missing: M00005.js, M00006.js
Jan 15 12:56:28 1b45699 resin-supervisor[28563]:     at validateMigrationList (/usr/src/app/dist/app.js:6:1129303)
Jan 15 12:56:28 1b45699 resin-supervisor[28563]:     at /usr/src/app/dist/app.js:6:1130462
Jan 15 12:56:28 1b45699 resin-supervisor[28563]: (node:1) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 8)

oohh so the error is The migration directory is corrupt, the following files are missing

Did you see the exact same error in the previous device, basically from which device is this error from?

I don’t know if that was on the old device to be honest. This is from the newer physical device (but the same UUID in Balena).

If it has the same UUID then it’s probably not re-flashed. Or it’s cloned from one SD to another. Above you said that The SD card has been reflashed multiple times too do you know how this was done and with what image?

I know that the firmware was retrieved and configured with balena os configure -- device [UUID] ... command which is why is has the same UUID, a couple of times (when trying to work out what was wrong). And then on the 2nd SD card, I believe a previously downloaded and configured file was written to the new SD card. I hope that makes sense?

The reason for forcing the -device UUID is so that we can know which device should be linked back to which customer on our platform… this: Add config (file/env-var) to BalenaOS image that can be read by container before writing to SD card

Thanks for your help.

Did you use etcher in order to flash the image? Did the verification succeeded? Just asking in order to see that the image was written correctly? If you get a fresh image from the dashboard does it behave the same way? Or is the issue localised to this particular customer and the image you sent them?

It wasn’t me directly but one of our customers. It was certainly Etcher that was used and given we’ve been trying multiple SD cards and re-flashed the same one more than once, I’m sure they’d have said if the validation failed.

Thanks again

Can you try to freshly pull an image with the same os and supervisor version with the device in question on your side, to test if the device behaves correctly?

Also just to clarify something, the issue always happens on this device, meaning that the supervisor isn’t running and your app never downloads and runs, right?