I’m not sure if my SD card is the problem (as in this post: Supervisor restarting before end of image downloading) but I got the supervisor restarting symptom. It stops and goes offline and then starts a new update routine. It seems to be the same place every time.
This is an old RPI1 B+
seconds)
HOST OS VERSION:
[balenaOS 2.46.1+rev1]
SUPERVISOR VERSION
10.6.27
This is the output I get. It goes on and on in loops.
We can’t tell for sure what is the problem here yet, can you please ssh into the host from the dashboard, capture the logs from the supervisor and share with us here?
You can run this command to get the logs journalctl --no-pager -u resin-supervisor -n 200 which will show last 200 lines.
It’s not that logs, there is a terminal on balena cloud dashboard and you can ssh into the host from there.
Can you please ssh into the host, run the command I sent above and send us the result?
Also can you try df -h command as well since the same symptom happens when your device is running our of space
Thanks for the info, these are indeed the correct logs. Space does not seem to be an issue (unless the release you are downloading is really big) and the supervisor logs just show it attempting to download your images over and over again. Could you log into the hostOS again and run journalctl --no-pager -u balena -n 200 and then journalctl --no-pager -u resin-supervisor -n 200 again and send us both logs? The supervisor logs will probably be the same, but being able to cross reference them with the ones from the balena service might help us shed some light on what is going on.
that is interesting, it looks like the balenad service is crashing. Ill forward the logs to the engine team as they should be able to provide some good pointers here, in the meantime could you re-run journalctl --no-pager -u balena -n 600 and send us the output of that as well (just grabbing the last 600 lines to see if the service keeps crashing on the same error or if other things are going on)
There seems to be something strange with the last logs you sent, it should be the last 600 lines, but it shows less and jumps from Jan 12 (the beginning of the logs) to today. Not sure exactly what caused this, but don’t worry if you are unable to grab the whole 600 lines, as this is only to confirm the engine was indeed crashing on the same error, the other logs you send above should provide enough context. Ill let you know as soon as the engine team responds.
Thanks, unfortunately it turns out the initial part of the balena logs is missing in all the logs you sent. Could you try playing around with journalctl --no-pager -u balena -n 600 and the -n parameter which controls the number of lines in the output, what we need is the full log around the balena crash. Looking at the last file you send us, the end of the “interesting part” would be
Mar 23 15:10:51 f42126f balenad[1715]: trap 0x0
Mar 23 15:10:51 f42126f balenad[1715]: error 0x0
Mar 23 15:10:51 f42126f balenad[1715]: oldmask 0x0
Mar 23 15:10:51 f42126f balenad[1715]: r0 0x1cff388
Mar 23 15:10:51 f42126f balenad[1715]: r1 0x0
Mar 23 15:10:51 f42126f balenad[1715]: r2 0x0
Mar 23 15:10:51 f42126f balenad[1715]: r3 0x0
Mar 23 15:10:51 f42126f balenad[1715]: r4 0x0
Mar 23 15:10:51 f42126f balenad[1715]: r5 0x0
Mar 23 15:10:51 f42126f balenad[1715]: r6 0x134e0000
Mar 23 15:10:51 f42126f balenad[1715]: r7 0xf0
Mar 23 15:10:51 f42126f balenad[1715]: r8 0x1cff2b8
Mar 23 15:10:51 f42126f balenad[1715]: r9 0x137e4eec
Mar 23 15:10:51 f42126f balenad[1715]: r10 0x1cfee98
Mar 23 15:10:51 f42126f balenad[1715]: fp 0x0
Mar 23 15:10:51 f42126f balenad[1715]: ip 0x13d85cbf
Mar 23 15:10:51 f42126f balenad[1715]: sp 0xbea3ba70
Mar 23 15:10:51 f42126f balenad[1715]: lr 0x3e6a0
Mar 23 15:10:51 f42126f balenad[1715]: pc 0x6f570
Mar 23 15:10:51 f42126f balenad[1715]: cpsr 0x20000010
Mar 23 15:10:51 f42126f balenad[1715]: fault 0x0
Mar 23 15:10:51 f42126f balenad[1715]: time="2020-03-23T15:10:51.156210512Z" level=error msg="error waiting for container: unexpected EOF"
Mar 23 15:10:51 f42126f balenad[1715]: balena: error during connect: Post http://%2Fvar%2Frun%2Fbalena-engine.sock/v1.39/containers/4b68a1587c4afc10f6c85f9e5812ec7008866e037e3688555133a9a10aed6521/start: EOF.
Mar 23 15:10:51 f42126f systemd[1]: balena.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
The initial part is unfortunately somewhere above the initial logs you send us. Can you try increasing the -n parameter until you are able to collect the full log of the crash? Thanks.
That would help me to tell what exactly tripped up balena-engine here.
If you look through the logs manually there should be a longer section that contains the engine stack trace. You can tell by the repeated occurrence of the goroutine <number> [<status>]: part, followed by a cascade of unraveled function calls similar to what you see above.
Hi, unfortunately without more context around those logs I can’t tell what might have happened and we will have to wait and see if it happens again.
What’s the state of your device now?
Now the device has been online for 4 days. That’s good news, I think.
I have run : ‘journalctl --no-pager -u balena -n 600’ again with the attached log from the console.
I am not sure if there is another log you are referring to. Let me know if there is another log you need and I’ll try to provide it.
Having looked at the latest logs, it now appears that the Supervisor can’t start up properly. If you’re able to share the UUID of the device with us, we might be able to access it and further have a look at what’s going on.