Device stuck in Updating loop

I have a Pi3 that I originally had linked to a BBGreen application; I’m pretty sure this used to work fine, but I’ve not been back to it for a while.

I wanted to play around with the NetworkManager DBus stuff, so added a few containers to the application. Now it is constantly updating, it gets to about 80%, then restarts the process again (and again…). It looks like the Supervisor is restarting.

19.06.19 13:43:19 (+0100) Supervisor starting
19.06.19 13:43:22 (+0100) Downloading image ‘registry2.balena-cloud.com/v2/c8f68@sha256:6b3003’
19.06.19 13:56:36 (+0100) Supervisor starting
19.06.19 13:56:39 (+0100) Downloading image ‘registry2.balena-cloud.com/v2/c8f68@sha256:6b3003’

I’ve tried creating a Pi3 application and have moved this device to it. I also tried deleting all of the other containers, so I just have one that is just the balenalib/raspberrypi3-python:latest-build image.
The UUID is 2ce36edeadad18f4f71dbd89199954ef if you are able to look into it.

Thanks.

Hello @sei1, I’ll have a look.

1 Like

Hey @sei1

I’ve spent a load of time in your device, trying to work out what’s going on, and nothing seems to make sense. We have balena-engine restarting the supervisor container, even though it reports the healthcheck endpoint is returning 200 OK. We also have a corruption check utility which doesn’t complete it’s run (that’s pretty much unheard of) and several other very strange issues.

Now one thing we did notice is that the kernel is reporting undervoltage warnings, and generally these present themselves in conjunction with really odd issues. Granted I’ve never seen a device behave as strangely as this one due to undervoltage, but in the absence of a better explanation, I’d ask if it’s possible for you to switch the power supply on this device, and see if the problems are still exhibited.

If they are then we definitely need to do more digging, but in it’s current state, I can’t really find out anything else unfortunately.

Hi @CameronDiver,
Thanks so much for taking a look. I’m not next to it right now, but in the morning I’ll find a different power supply for the pi and let you know what I see.
Really appreciate your help.

Hi again @CameronDiver

That seems to have done it - I changed out the Pi3 power source and everything just loaded straight away.

For interest, where would I see the undervoltage warnings?

Thanks for the support.

To check for undervoltage warnings, you can run dmesg | grep undervoltage.

I found it just by looking at all kernel logs with dmesg.

I’m glad to hear your device is now working well.

1 Like

Hi @CameronDiver,

Sorry to dig this up again. I’m getting more problems with this unit.
I reflashed the SD card - it looked like it may have been low on space, so there is a different UUID now: cbd199020ad67a2cf083a22a6d47884c

I’ve also gone out and got the official Pi PSU!
I don’t see any voltage warnings in dmesg, but I am getting this kind of thing in journalctl:

Jul 03 13:53:01 cbd1990 f4e84c55779a[841]: [51B blob data]
Jul 03 13:53:05 cbd1990 f4e84c55779a[841]: [51B blob data]
Jul 03 13:53:09 cbd1990 f4e84c55779a[841]: [51B blob data]
Jul 03 13:53:14 cbd1990 f4e84c55779a[841]: [51B blob data]
Jul 03 13:53:17 cbd1990 928e43c1a723[841]: (node:1) UnhandledPromiseRejectionWarning: TimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?
Jul 03 13:53:17 cbd1990 928e43c1a723[841]: at /usr/src/app/dist/app.js:21:42906
Jul 03 13:53:17 cbd1990 928e43c1a723[841]: at c (/usr/src/app/dist/app.js:9:74428)
Jul 03 13:53:17 cbd1990 resin-supervisor[5936]: (node:1) UnhandledPromiseRejectionWarning: TimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?
Jul 03 13:53:17 cbd1990 resin-supervisor[5936]: at /usr/src/app/dist/app.js:21:42906
Jul 03 13:53:17 cbd1990 resin-supervisor[5936]: at c (/usr/src/app/dist/app.js:9:74428)
Jul 03 13:53:17 cbd1990 resin-supervisor[5936]: at /usr/src/app/dist/app.js:103:29838
Jul 03 13:53:17 cbd1990 resin-supervisor[5936]: at c (/usr/src/app/dist/app.js:9:74428)
Jul 03 13:53:17 cbd1990 resin-supervisor[5936]: at A._settlePromiseFromHandler (/usr/src/app/dist/app.js:293:114147)
Jul 03 13:53:17 cbd1990 resin-supervisor[5936]: at A._settlePromise (/usr/src/app/dist/app.js:293:114947)
Jul 03 13:53:17 cbd1990 resin-supervisor[5936]: at A._settlePromise0 (/usr/src/app/dist/app.js:293:115646)
Jul 03 13:53:17 cbd1990 resin-supervisor[5936]: at A._settlePromises (/usr/src/app/dist/app.js:293:116887)
Jul 03 13:53:17 cbd1990 resin-supervisor[5936]: at f (/usr/src/app/dist/app.js:293:118592)
Jul 03 13:53:17 cbd1990 resin-supervisor[5936]: at p (/usr/src/app/dist/app.js:293:118531)
Jul 03 13:53:17 cbd1990 resin-supervisor[5936]: at s._drainQueues (/usr/src/app/dist/app.js:293:120051)
Jul 03 13:53:17 cbd1990 resin-supervisor[5936]: at Immediate.drainQueues [as _onImmediate] (/usr/src/app/dist/app.js:293:118273)
Jul 03 13:53:17 cbd1990 resin-supervisor[5936]: at runCallback (timers.js:705:18)
Jul 03 13:53:17 cbd1990 resin-supervisor[5936]: at tryOnImmediate (timers.js:676:5)
Jul 03 13:53:17 cbd1990 resin-supervisor[5936]: at processImmediate (timers.js:658:5)
Jul 03 13:53:17 cbd1990 resin-supervisor[5936]: (node:1) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 3)
Jul 03 13:53:17 cbd1990 resin-supervisor[5936]: (node:1) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
Jul 03 13:53:17 cbd1990 928e43c1a723[841]: at /usr/src/app/dist/app.js:103:29838
Jul 03 13:53:17 cbd1990 928e43c1a723[841]: at c (/usr/src/app/dist/app.js:9:74428)
Jul 03 13:53:17 cbd1990 928e43c1a723[841]: at A._settlePromiseFromHandler (/usr/src/app/dist/app.js:293:114147)
Jul 03 13:53:17 cbd1990 928e43c1a723[841]: at A._settlePromise (/usr/src/app/dist/app.js:293:114947)
Jul 03 13:53:17 cbd1990 928e43c1a723[841]: at A._settlePromise0 (/usr/src/app/dist/app.js:293:115646)
Jul 03 13:53:17 cbd1990 928e43c1a723[841]: at A._settlePromises (/usr/src/app/dist/app.js:293:116887)
Jul 03 13:53:17 cbd1990 928e43c1a723[841]: at f (/usr/src/app/dist/app.js:293:118592)
Jul 03 13:53:17 cbd1990 928e43c1a723[841]: at p (/usr/src/app/dist/app.js:293:118531)
Jul 03 13:53:17 cbd1990 928e43c1a723[841]: at s._drainQueues (/usr/src/app/dist/app.js:293:120051)
Jul 03 13:53:17 cbd1990 928e43c1a723[841]: at Immediate.drainQueues [as _onImmediate] (/usr/src/app/dist/app.js:293:118273)
Jul 03 13:53:17 cbd1990 928e43c1a723[841]: at runCallback (timers.js:705:18)
Jul 03 13:53:17 cbd1990 928e43c1a723[841]: at tryOnImmediate (timers.js:676:5)
Jul 03 13:53:17 cbd1990 928e43c1a723[841]: at processImmediate (timers.js:658:5)
Jul 03 13:53:17 cbd1990 928e43c1a723[841]: (node:1) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 3)
Jul 03 13:53:17 cbd1990 928e43c1a723[841]: (node:1) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
Jul 03 13:53:18 cbd1990 f4e84c55779a[841]: [51B blob data]

Any thoughts?

Cheers.

Hey @sei1 it looks like you’re experiencing the issue here: https://github.com/balena-io/balena-supervisor/issues/987

If you link me to your device and enabled support access, I can upgrade the supervisor for you to pick up the fix.

Great, thanks.

The UUID is cbd199020ad67a2cf083a22a6d47884c
Support Access should be switched on already.

Hey @sei1 I’ve updated the supervisor on your device, and while monitoring, it seems that balena-engine takes a load of memory and is then killed. It appears to happen while docker-untar is taking a fair bit at the same time. Have you deployed this release sucessfully on an rpi already?

No, I don’t think so. I guess the issue is with the dbus-python container which I recently added to play around with some python (and dbus!). The Docker file is simply:

FROM balenalib/%%BALENA_MACHINE_NAME%%-python:latest-build
CMD [“/bin/sh”]

The unicorn tells me that the built image is 591.10MB.
I guess I’ll try changing it to an alpine base version to try and keep the image size down.

Thanks again for your help on this.

Hey,

Could you possibly try a new SD card? We ran a diagnostic on the device and it is indicating that the SD card is writing slow, so coupled with a large application it could be that the time taken to write it to disk is causing some issues.

If you have a new/different SD card to hand it’s worth a shot :+1:

Sure. I’m just flashing 2.36.0 dev image to a different card.
Etcher is showing about 5MB/s…

I recommend the SanDisk Extreme Pro cards - I see ~70MB/s writes on those.

I’d already deployed the alpine distribution version of the python-build container; only about 100MB smaller than the debian.

With a different card (they are both SanDisk Class 4, purchased at the same time), the Pi has now successfully downloaded and is up and running!
I’ll get some Class 10 cards in, hopefully it’ll make everything a bit smoother.

For my future info, would you mind sharing the diagnostic you ran to test the SD? I looked in the Diagnostic screen on the dashboard, but might have missed it if it was there.

Cheers.

You can see DISK PARTITION WRITES SLOW lines in the beginning of the diagnostics output.
This info is extracted from /proc/diskstats for which you have some docs here https://www.kernel.org/doc/Documentation/iostats.txt .

Thanks @zvin,
The new card is also showing slow there too, I guess I got lucky with the fresh start.

Cheers.