Raspberry Pis keep rebooting

spanceac · August 15, 2019, 11:02am

Same situation.

We will try this one of our boards and see if we can reproduce

ajs1k · August 15, 2019, 11:18am

I swapped over the board, just now.

I can also try swapping over the SD card if swapping the board has had no effect…

ajs1k · August 15, 2019, 11:19am

No effect - catting /var/cache/ldconfig/aux-cache crashes the system. I’ll swap out the SD card next.

richbayliss · August 15, 2019, 11:25am

Just a quick one; have you re-downloaded the image from the dashboard, or is this the same image file each time you flash? Would be good to know, as it could be a bad image file.

ajs1k · August 15, 2019, 11:31am

It’s the same image file - I’ll re-fetch it and compare MD5. Swapped out the SD card in the meantime - it’s now 1f231ff.

ajs1k · August 15, 2019, 11:46am

can still crash the system by reading this file. i’ll try flashing the newly downloaded image.

spanceac · August 15, 2019, 12:13pm

I reproduced this issue on our boards too. We will let you know how investigation goes

ajs1k · August 21, 2019, 8:59am

Hi guys - how’s this looking ?

ajs1k · August 28, 2019, 8:21am

I’m giving balenaOS 2.41.0+rev3 a try… let’s see if there were any fixes in that release…

ajs1k · August 28, 2019, 11:09am

After I deployed the new OS, everything stopped working. It gets stuck starting my X11 server and I’ve also discovered that my docker build is busted since python no longer includes distutils.util as part of the base install… and I’ve been as yet unable to find out which package would contain it.

ajs1k · August 28, 2019, 11:21am

[solved] needed an extra apt-get update before apt-get install python3-distutils could be found.

ajs1k · August 29, 2019, 2:31pm

Well 2.41.0r3 is even worse: my container stays up for a few minutes and then crashes and gets stuck in an infinite crash/retry/crash/retry loop.

zubairlk · September 30, 2019, 10:09am

Hi,
The following issue was fixed

by this PR volatile-binds: Avoid overlayfs mounts by agherzan · Pull Request #1620 · balena-os/meta-balena · GitHub

So this shouldn’t really happen in v2.41.

There could be something else going on here as well.
Can you please grant support access for a week and share the long device url?

Thanks
ZubairLK

ajs1k · September 30, 2019, 12:42pm

I moved to 2.43.0r1 recently - ff0a69965cbf0e78924b1ef3fc5500a8 still seems to want to restart its container periodically. In fact none of my systems are really stable:

Uptimes are 17 hours, 16 hours, 2 days, 3 days and 3 days. None of the restarts was due to deliberate action on my part.

Thanks!
Al.

zubairlk · September 30, 2019, 1:46pm

Well 2.41.0r3 is even worse: my container stays up for a few minutes and then crashes and gets stuck in an infinite crash/retry/crash/retry loop.

Can you please share the log if you have any?

Uptimes are 17 hours, 16 hours, 2 days, 3 days and 3 days. None of the restarts was due to deliberate action on my part.

Is this from the dashboard? That information is from the vpn/internet connectivity. Actual device uptime can be seen by logging into the device and checking using the uptime command.

I’m afraid its quite hard to debug sporadic crashes/reboots without looking at logs or stack traces.

The device you linked ff0a69965cbf0e78924b1ef3fc5500a8 seems to be running ok at the moment. I see some warnings in the the logs which I need to investigate further here Investigate systemd slice warnings · Issue #1691 · balena-os/meta-balena · GitHub

ajs1k · October 1, 2019, 3:56am

Unfortunately I don’t have any logs from 2.41.0r3’s behaviour.

Regarding the uptimes, yes this is from the dashboard - although there is another clue that all is not well: the wallboard has one page that I’ve not been able to have automated so I have to VNC into it and enter a username/password before it displays. And of course every time there is a restart of the Pi or of the container, I have to log in again. This happens at least once a day I’m afraid.

majorz · October 1, 2019, 1:51pm

Hi, I checked the device you provided UUID for and I see a couple of issues there.

The device is shown online for 3 hours, but the uptime is more than a day. I found this corresponding log entry:

Oct 01 10:02:04 ff0a699 openvpn[1991]: Tue Oct  1 10:02:04 2019 Connection reset, restarting [-1]

So the VPN connection was reset somewhere on the way to our servers and this is why the device is shown as online for less time than it had been up.

Earlier in the kernel logs I see:

[    1.941086] fsck.fat 4.1 (2017-01-24)
               0x41: Dirty bit is set. Fs was not properly unmounted and some data may be corrupt.
                Automatically removing dirty bit.
               Performing changes.
               /dev/disk/by-label/resin-boot: 157 files, 16367/80628 clusters

This leads me to think that the device was not rebooted cleanly, e.g. with a reboot command or similar.

This is all the information we could retrieve. I noticed that you enabled persistent logging, which is helpful. Once you notice a problem with those devices please ping us again so that we may continue investigating the logs.

Please let me know if you have any questions.

Thanks,
Zahari

zubairlk · October 1, 2019, 1:59pm

I’m afraid its quite hard to debug without an active stack-trace or a fast repeatable test case.

The logs are polluted with

Sep 30 14:20:42 balena systemd[1]: Removed slice libcontainer_937_systemd_test_default.slice.
Sep 30 14:20:42 balena systemd[1]: Created slice libcontainer_943_systemd_test_default.slice.

for which I have a PR in the next version. https://github.com/balena-os/meta-balena/pull/1692

I’d recommend editing /mnt/boot/cmdline.txt and adding systemd.log_level=notice to reduce that noise.

Then perhaps with persistentLogging, we might be able to see a stack-trace or logs before the crash.

Also, please do check if your application can withstand intermittent connectivity as mentioned in the previous message by majorz.

ajs1k · October 1, 2019, 2:17pm

Ok, Added that and will reboot. Let’s see if we can get any further…

Topic		Replies	Views
How to troubleshoot reboots? balenaOS	5	263	September 7, 2021
Trying to develop a Balena System with reliable rebooting balenaOS	10	505	September 28, 2021
balenaOS 2.80.5+rev1, supervisor version 12.8.10 keeps rebooting. balenaOS	6	628	July 15, 2021
Figure out cause of unexpected Balena Reboot after 'x' time Product support	4	241	February 24, 2022
Restart logs for services balenaOS raspberrypi4	6	161	March 7, 2024

Raspberry Pis keep rebooting

Related topics