balena app doesn't load pods

Hi!

I have a multicontainer app running on multiple devices, the app is correctly working on every device with one exception.

On a single device the pod has disappear and I found this error on the supervisor pod:

(node:1) UnhandledPromiseRejectionWarning: Error: (HTTP code 500) server error - layer does not exist 
Jul 21 10:59:28 23693f8 d93313190b7a[866]:     at /usr/src/app/dist/app.js:596:111352
Jul 21 10:59:28 23693f8 d93313190b7a[866]:     at /usr/src/app/dist/app.js:596:111315
Jul 21 10:59:28 23693f8 d93313190b7a[866]:     at m.buildPayload (/usr/src/app/dist/app.js:596:111325)
Jul 21 10:59:28 23693f8 d93313190b7a[866]:     at IncomingMessage.<anonymous> (/usr/src/app/dist/app.js:596:110825)
Jul 21 10:59:28 23693f8 d93313190b7a[866]:     at IncomingMessage.emit (events.js:194:15)
Jul 21 10:59:28 23693f8 d93313190b7a[866]:     at endReadableNT (_stream_readable.js:1125:12)
Jul 21 10:59:28 23693f8 d93313190b7a[866]:     at process._tickCallback (internal/process/next_tick.js:63:19)
Jul 21 10:59:28 23693f8 d93313190b7a[866]: (node:1) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise whic>
Jul 21 10:59:28 23693f8 d93313190b7a[866]: (node:1) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit

I tried to change app on the device through the balena cli, but the supervisor doesn’t seems to respond.

I like to find a way to solve remotely this problem as this device has already been sent to a client.
Do you have any suggestion?

Thanks

Hi there, when you say you device isn’t working and the pod has disapeared, do you mean that on your device the application containers aren’t there? or the supervisor container is not there? Thanks

the application containers aren’t there, there’s just the supervisor one

result of

journalctl --no-pager -a -u resin-supervisorjournalctl --no-pager -a -u resin-supervisor

Started Resin supervisor.
Jul 21 12:02:37 23693f8 resin-supervisor[5365]: Container config has not changed
Jul 21 12:02:38 23693f8 resin-supervisor[5365]: Starting system message bus: dbus.
Jul 21 12:02:38 23693f8 resin-supervisor[5365]:  * Starting Avahi mDNS/DNS-SD Daemon: avahi-daemon
Jul 21 12:02:38 23693f8 resin-supervisor[5365]:    ...done.
Jul 21 12:02:43 23693f8 resin-supervisor[5365]: [info]    Supervisor v10.2.2 starting up...
Jul 21 12:02:44 23693f8 resin-supervisor[5365]: (node:1) [DEP0005] DeprecationWarning: Buffer() is deprecated due to security and usability issues. Please use the Buffer.alloc(), Buffer.allocUnsafe(), or Buffer.from() methods instead.
Jul 21 12:02:44 23693f8 resin-supervisor[5365]: [debug]   Starting event tracker
Jul 21 12:02:44 23693f8 resin-supervisor[5365]: [debug]   Starting api binder
Jul 21 12:02:44 23693f8 resin-supervisor[5365]: [debug]   Starting logging infrastructure
Jul 21 12:02:44 23693f8 resin-supervisor[5365]: [event]   Event: Supervisor start {}
Jul 21 12:02:44 23693f8 resin-supervisor[5365]: [debug]   Performing database cleanup for container log timestamps
Jul 21 12:02:44 23693f8 resin-supervisor[5365]: [info]    Previous engine snapshot was not stored. Skipping cleanup.
Jul 21 12:02:44 23693f8 resin-supervisor[5365]: [debug]   Handling of local mode switch is completed
Jul 21 12:02:44 23693f8 resin-supervisor[5365]: [debug]   Connectivity check enabled: true
Jul 21 12:02:44 23693f8 resin-supervisor[5365]: [debug]   Starting periodic check for IP addresses
Jul 21 12:02:44 23693f8 resin-supervisor[5365]: [info]    Reporting initial state, supervisor version and API info
Jul 21 12:02:44 23693f8 resin-supervisor[5365]: [debug]   VPN status path exists.
Jul 21 12:02:44 23693f8 resin-supervisor[5365]: [info]    Waiting for connectivity...
Jul 21 12:02:44 23693f8 resin-supervisor[5365]: (node:1) UnhandledPromiseRejectionWarning: Error: (HTTP code 500) server error - layer does not exist
Jul 21 12:02:44 23693f8 resin-supervisor[5365]:     at /usr/src/app/dist/app.js:596:111352
Jul 21 12:02:44 23693f8 resin-supervisor[5365]:     at /usr/src/app/dist/app.js:596:111315
Jul 21 12:02:44 23693f8 resin-supervisor[5365]:     at m.buildPayload (/usr/src/app/dist/app.js:596:111325)
Jul 21 12:02:44 23693f8 resin-supervisor[5365]:     at IncomingMessage.<anonymous> (/usr/src/app/dist/app.js:596:110825)
Jul 21 12:02:44 23693f8 resin-supervisor[5365]:     at IncomingMessage.emit (events.js:194:15)
Jul 21 12:02:44 23693f8 resin-supervisor[5365]:     at endReadableNT (_stream_readable.js:1125:12)
Jul 21 12:02:44 23693f8 resin-supervisor[5365]:     at process._tickCallback (internal/process/next_tick.js:63:19)
Jul 21 12:02:44 23693f8 resin-supervisor[5365]: (node:1) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
Jul 21 12:02:44 23693f8 resin-supervisor[5365]: (node:1) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
Jul 21 12:02:54 23693f8 resin-supervisor[5365]: [info]    Internet Connectivity: OK
Jul 21 12:03:37 23693f8 resin-supervisor[5365]: [error]   LogBackend: server responded with status code: 504
Jul 21 12:12:43 23693f8 resin-supervisor[5365]: [debug]   Attempting container log timestamp flush...
Jul 21 12:12:43 23693f8 resin-supervisor[5365]: [debug]   Container log timestamp flush complete
Jul 21 12:20:35 23693f8 systemd[1]: resin-supervisor.service: Watchdog timeout (limit 3min)!
Jul 21 12:20:35 23693f8 systemd[1]: resin-supervisor.service: Killing process 5365 (start-resin-sup) with signal SIGABRT.
Jul 21 12:20:35 23693f8 systemd[1]: resin-supervisor.service: Killing process 5366 (exe) with signal SIGABRT.
Jul 21 12:20:35 23693f8 systemd[1]: resin-supervisor.service: Killing process 5446 (balena) with signal SIGABRT.
Jul 21 12:20:35 23693f8 systemd[1]: resin-supervisor.service: Main process exited, code=killed, status=6/ABRT
Jul 21 12:22:05 23693f8 systemd[1]: resin-supervisor.service: State 'stop-final-sigterm' timed out. Killing.
Jul 21 12:22:05 23693f8 systemd[1]: resin-supervisor.service: Killing process 5446 (balena) with signal SIGKILL.
Jul 21 12:22:05 23693f8 systemd[1]: resin-supervisor.service: Failed with result 'watchdog'.
Jul 21 12:22:15 23693f8 systemd[1]: resin-supervisor.service: Service RestartSec=10s expired, scheduling restart.
Jul 21 12:22:15 23693f8 systemd[1]: resin-supervisor.service: Scheduled restart job, restart counter is at 4.
Jul 21 12:22:15 23693f8 systemd[1]: Stopped Resin supervisor.
Jul 21 12:22:15 23693f8 systemd[1]: Starting Resin supervisor...

I am using a rpi3 with balenaOS 2.43.0+rev1 and SUPERVISOR VERSION: 10.2.2

do you get any errors in dmesg ? I’ve seen this happen before when there was some fs corruption

This is also quite an old version of the OS. Are all of your devices running this OS and supervisor version?

yes all my rpi3s are running this OS/supervisor

those should be the errors

[  242.723830] brcmfmac: brcmf_vif_set_mgmt_ie: vndr ie set error : -52
[  242.731556] brcmfmac: brcmf_vif_set_mgmt_ie: vndr ie set error : -52


[   10.743922] bcm2835_mmal_vchiq: Failed to open VCHI service connection (status=-1)
[   10.747794] NET: Registered protocol family 31
[   10.756746] usbcore: registered new interface driver smsc95xx
[   10.770212] Bluetooth: HCI device and connection manager initialized
[   10.804858] Bluetooth: HCI socket layer initialized
[   10.807114] bcm2835_audio soc:audio: card created with 8 channels
[   10.811858] Bluetooth: L2CAP socket layer initialized
[   10.812302] bcm2835_codec: module is from the staging directory, the quality is unknown, you have been warned.
[   10.813813] platform regulatory.0: Direct firmware load for regulatory.db failed with error -2
[   10.813821] cfg80211: failed to load regulatory.db
[   10.821866] bcm2835_mmal_vchiq: Failed to open VCHI service connection (status=-1)
[   10.827487] Bluetooth: SCO socket layer initialized
[   10.945279] brcmfmac: F1 signature read @0x18000000=0x1541a9a6
[   10.951928] brcmfmac: brcmf_fw_alloc_request: using brcm/brcmfmac43430-sdio for chip BCM43430/1
[   10.964925] usbcore: registered new interface driver brcmfmac
[   10.982320] usbcore: registered new interface driver btusb
[   11.149641] random: crng init done
[   11.154927] random: 7 urandom warning(s) missed due to ratelimiting
[   11.174118] usb 1-1.4: reset high-speed USB device number 5 using dwc_otg
[   11.187433] brcmfmac: brcmf_fw_alloc_request: using brcm/brcmfmac43430-sdio for chip BCM43430/1
[   11.200247] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[   11.216375] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43430/1 wl0: Oct 23 2017 03:55:53 version 7.45.98.38 (r674442 CY) FWID 01-e58d219f

if it’s fs corruption is there something I might be able to do remotely? There’s a cause?

I’ll check with the supervisor team for their suggestion. There’s nothing in the dmesg logs you just posted that suggests fs corruption though at least

1 Like

Hi,

When the Supervisor logs error messages which include the format (HTTP code ...) <ERROR_MESSAGE> this means that the Supervisor is communicating an error originating from balena Engine’s HTTP API. In this case, a 500 and layer does not exist seems to indicate that the Supervisor cannot find a particular image layer that it has a reference to, thus this results in the device’s invalid state. Is the device currently looping this error state that you shared with logs above? Did this start to happen after an app update?

Also, if you could enable support access to the affected device for 24+ hours and send a link to it here, that would help make debugging faster. Thanks!

Regards,
Christina

I preloaded an app on this device, after a few days I’ve noticed that it wasn’t sending any messagges, ssh into the device and found out that there were no pods.

Then I tried to move the device to another app hoping it’ll start working again, but nothing changed.

I using openbalena, i don’t think I can enable support access… do you have any other way? If not what would you suggest?

Thanks again.

Matteo

Well, one way you could do it is add a balena device to balena cloud, enable support access to it, then add the public ssh key from that device to your other device.

Apart from that, I think this issue is going to be very hard to debug unless we have access ssh access. You could always try upgrading the Host OS and supervisor though. Just make sure you save your data before you do, just in case.

@matteopeluso Did you have any success with Zane’s suggestion above?

Hi! Sorry for the late answer, at the moment I’m not allowed to add it to the balena cloud…

Thanks again :slight_smile:
Matteo