Update and restarting loop on Rpi1

Hi,

I’m not sure if my SD card is the problem (as in this post: Supervisor restarting before end of image downloading) but I got the supervisor restarting symptom. It stops and goes offline and then starts a new update routine. It seems to be the same place every time.

This is an old RPI1 B+

seconds)

HOST OS VERSION:

[balenaOS 2.46.1+rev1]

SUPERVISOR VERSION

10.6.27

This is the output I get. It goes on and on in loops.

23.03.20 09:54:35 (+0100) Downloading image ‘registry2.balena-cloud.com/v2/74e853db86165bf4f1fd1d35cf369c58@sha256:70667ac88e0f17d9a4c2ab9faf4d93c42b545ecc650dcc312372fb1a572f6cd4

23.03.20 10:03:19 (+0100) Supervisor starting



23.03.20 10:03:39 (+0100) Downloading image ‘registry2.balena-cloud.com/v2/74e853db86165bf4f1fd1d35cf369c58@sha256:70667ac88e0f17d9a4c2ab9faf4d93c42b545ecc650dcc312372fb1a572f6cd4

23.03.20 10:12:24 (+0100) Supervisor starting

Hi,

We can’t tell for sure what is the problem here yet, can you please ssh into the host from the dashboard, capture the logs from the supervisor and share with us here?
You can run this command to get the logs journalctl --no-pager -u resin-supervisor -n 200 which will show last 200 lines.

I tried to run that SSH. Is the attached log what you are looking for?

pimusic_sparkling-sound-23.03.20_11_56_38_( 0100).log (31.7 KB)

Hi,

It’s not that logs, there is a terminal on balena cloud dashboard and you can ssh into the host from there.
Can you please ssh into the host, run the command I sent above and send us the result?
Also can you try df -h command as well since the same symptom happens when your device is running our of space

root@f42126f:~# df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 105M 0 105M 0% /dev
/dev/disk/by-partuuid/6f8fb15d-02 300M 224M 57M 80% /mnt/sysroot/active
/dev/disk/by-label/resin-state 19M 218K 17M 2% /mnt/state
none 300M 224M 57M 80% /
tmpfs 113M 12K 113M 1% /dev/shm
tmpfs 113M 8.7M 105M 8% /run
tmpfs 113M 0 113M 0% /sys/fs/cgroup
tmpfs 113M 0 113M 0% /tmp
/dev/mmcblk0p1 40M 8.1M 32M 21% /mnt/boot
tmpfs 113M 24K 113M 1% /var/volatile
/dev/mmcblk0p6 14G 1.7G 12G 13% /mnt/data

Maybe my SD is too small? It is an older 16GB card. Only 30MB/S.

I got an output to the terminal this time. See attached below.
[journalctl.log|attachment]
(upload://mGsQD26aNd0qdDMPNUVaPeBsDS2.log) (10.5 KB)

Thanks for the info, these are indeed the correct logs. Space does not seem to be an issue (unless the release you are downloading is really big) and the supervisor logs just show it attempting to download your images over and over again. Could you log into the hostOS again and run journalctl --no-pager -u balena -n 200 and then journalctl --no-pager -u resin-supervisor -n 200 again and send us both logs? The supervisor logs will probably be the same, but being able to cross reference them with the ones from the balena service might help us shed some light on what is going on.

Here are the logs you asked for. A few blank lines in between. Thanks.

pimusic_sparkling-sound-1-2 copy.log (35.2 KB)

that is interesting, it looks like the balenad service is crashing. Ill forward the logs to the engine team as they should be able to provide some good pointers here, in the meantime could you re-run journalctl --no-pager -u balena -n 600 and send us the output of that as well (just grabbing the last 600 lines to see if the service keeps crashing on the same error or if other things are going on)

pimusic_sparkling-sound_600.log (11.0 KB)

There seems to be something strange with the last logs you sent, it should be the last 600 lines, but it shows less and jumps from Jan 12 (the beginning of the logs) to today. Not sure exactly what caused this, but don’t worry if you are unable to grab the whole 600 lines, as this is only to confirm the engine was indeed crashing on the same error, the other logs you send above should provide enough context. Ill let you know as soon as the engine team responds.

True. I tried it again and this time I get some more.

pimusic_sparkling-sound_600v2.log (56.3 KB)

Thanks, unfortunately it turns out the initial part of the balena logs is missing in all the logs you sent. Could you try playing around with journalctl --no-pager -u balena -n 600 and the -n parameter which controls the number of lines in the output, what we need is the full log around the balena crash. Looking at the last file you send us, the end of the “interesting part” would be

Mar 23 15:10:51 f42126f balenad[1715]: trap    0x0
Mar 23 15:10:51 f42126f balenad[1715]: error   0x0
Mar 23 15:10:51 f42126f balenad[1715]: oldmask 0x0
Mar 23 15:10:51 f42126f balenad[1715]: r0      0x1cff388
Mar 23 15:10:51 f42126f balenad[1715]: r1      0x0
Mar 23 15:10:51 f42126f balenad[1715]: r2      0x0
Mar 23 15:10:51 f42126f balenad[1715]: r3      0x0
Mar 23 15:10:51 f42126f balenad[1715]: r4      0x0
Mar 23 15:10:51 f42126f balenad[1715]: r5      0x0
Mar 23 15:10:51 f42126f balenad[1715]: r6      0x134e0000
Mar 23 15:10:51 f42126f balenad[1715]: r7      0xf0
Mar 23 15:10:51 f42126f balenad[1715]: r8      0x1cff2b8
Mar 23 15:10:51 f42126f balenad[1715]: r9      0x137e4eec
Mar 23 15:10:51 f42126f balenad[1715]: r10     0x1cfee98
Mar 23 15:10:51 f42126f balenad[1715]: fp      0x0
Mar 23 15:10:51 f42126f balenad[1715]: ip      0x13d85cbf
Mar 23 15:10:51 f42126f balenad[1715]: sp      0xbea3ba70
Mar 23 15:10:51 f42126f balenad[1715]: lr      0x3e6a0
Mar 23 15:10:51 f42126f balenad[1715]: pc      0x6f570
Mar 23 15:10:51 f42126f balenad[1715]: cpsr    0x20000010
Mar 23 15:10:51 f42126f balenad[1715]: fault   0x0
Mar 23 15:10:51 f42126f balenad[1715]: time="2020-03-23T15:10:51.156210512Z" level=error msg="error waiting for container: unexpected EOF"
Mar 23 15:10:51 f42126f balenad[1715]: balena: error during connect: Post http://%2Fvar%2Frun%2Fbalena-engine.sock/v1.39/containers/4b68a1587c4afc10f6c85f9e5812ec7008866e037e3688555133a9a10aed6521/start: EOF.
Mar 23 15:10:51 f42126f systemd[1]: balena.service: Main process exited, code=exited, status=2/INVALIDARGUMENT

The initial part is unfortunately somewhere above the initial logs you send us. Can you try increasing the -n parameter until you are able to collect the full log of the crash? Thanks.

journalctl2.log (56.2 KB)

Hope this will be good. I see some problem here. And the end of the output.

Thanks again.

hi @emelleby I’m looking for something like this in the logs:

<LOG>
<LOG>
panic: <oh no!>

goroutine 1 [running]:
main.main()
	/tmp/sandbox274190890/prog.go:9 +0xa0

That would help me to tell what exactly tripped up balena-engine here.

If you look through the logs manually there should be a longer section that contains the engine stack trace. You can tell by the repeated occurrence of the goroutine <number> [<status>]: part, followed by a cascade of unraveled function calls similar to what you see above.

Can’t say that I find any of that in the log or the terminal output when I run the journalctl command.

Hi, unfortunately without more context around those logs I can’t tell what might have happened and we will have to wait and see if it happens again.
What’s the state of your device now?

The device is restarting multiple times and gets stuck in an update while downloading.

This goes on for about 12 hours until the device goes offline permanently. It then needs to be physically rebooted.

Maybe it’s a lost case and I should move on. I can’t get any more logs out of it I think.

Hi, Not sure we can do much here without any logs.

Now the device has been online for 4 days. That’s good news, I think.
I have run : ‘journalctl --no-pager -u balena -n 600’ again with the attached log from the console.

I am not sure if there is another log you are referring to. Let me know if there is another log you need and I’ll try to provide it.

journalctl3.log (58.9 KB)

Hi there,

Having looked at the latest logs, it now appears that the Supervisor can’t start up properly. If you’re able to share the UUID of the device with us, we might be able to access it and further have a look at what’s going on.

Best regards,

Heds