Raspberry Pis keep rebooting

mikesimos · August 14, 2019, 10:42am

Cheers! A poor power supply can be a catalytic factor to sd corruption as well. Best regards!

ajs1k · August 14, 2019, 11:06am

Ok, a2cbe54 and 2847085 have the official raspberry 5V/2.5A PSUs.

2847085 will get a replacement SD card (how did you detect the fault?)
00e0833 has a 2A PSU so I’ll order a new one for it - the under-voltage messages are visible.

I don’t know what I can do for a2cbe54 - any ideas? I didn’t see any under-voltage message on that one.

ajs1k · August 14, 2019, 11:27am

a2cbe54 has this to say for itself - is there any place else that we could look to find out why it just spontaneously rebooted ?

Aug 14 08:43:00 a2cbe54 df7643779caa[26320]: Supervisor API: GET /v1/healthy 200 - 167.548 ms
Aug 14 08:43:00 a2cbe54 resin-supervisor[26773]: Supervisor API: GET /v1/healthy 200 - 167.548 ms
Aug 14 08:44:48 a2cbe54 resin-supervisor[26773]: Attempting container log timestamp flush…
Aug 14 08:44:48 a2cbe54 resin-supervisor[26773]: Container log timestamp flush complete
Aug 14 08:44:48 a2cbe54 df7643779caa[26320]: Attempting container log timestamp flush…
Aug 14 08:44:48 a2cbe54 df7643779caa[26320]: Container log timestamp flush complete
– Reboot –
Aug 14 08:47:32 a2cbe54 systemd-journald[464]: Time spent on flushing to /var is 35.661ms for 479 entries.
Aug 14 08:47:32 a2cbe54 systemd-journald[464]: System journal (/var/log/journal/3745cc0eee3b49568d92f8099509336d) is 8.0M, max 8.0M, 0B free.
Aug 14 08:47:33 a2cbe54 resin-persistent-logs[654]: resin-persistent-logs: Persistent logging activated.

imrehg · August 14, 2019, 11:28am

Just checking the device now, and get back to you in a bit.

ajs1k · August 14, 2019, 11:29am

It must not like being watched:

Aug 14 11:22:03 a2cbe54 resin-supervisor[1316]: Supervisor API: GET /v1/healthy 200 - 8.783 ms
Aug 14 11:26:01 a2cbe54 systemd[1]: Starting OpenSSH Per-Connection Daemon (52.4.252.97:54680)…
Aug 14 11:26:01 a2cbe54 systemd[1]: Started OpenSSH Per-Connection Daemon (52.4.252.97:54680).
– Reboot –
Aug 14 11:27:18 a2cbe54 chronyd[652]: 2019-08-14T10:53:35Z Selected source 81.94.123.16
Aug 14 11:27:18 a2cbe54 chronyd[652]: 2019-08-14T10:53:35Z System clock wrong by 2023.170439 seconds, adjustment started

imrehg · August 14, 2019, 12:27pm

Hey, been looking at the device since the last message, and nothing stands out. Looked the logs over the last two reboots as well, and the reboot seems to be sudden (no sign of what might be happening, as it would be with software reboot or watchdog reboot in general), that suggest power related issue, e.g. power dropping out, or something similar. You mention that that device (a2cbe54) has an official power supply? Would worth trying to switch the power supply around, just in case, or see if there’s any problem with the wall socket/wherever the PSU itself is being plugged in.

It just rebooted now, and checking the logs again doesn’t show anything that would have caused this reboot softwarewise

The storage corruption we check by comparing the rootfs with a fingerprint file shipped. Just FYI this is one script we use to check that ignores files that are known to change sometimes, and just report back if something unexpected changes:

fingerprint_check() {
    # Check fingerprint and ignore lines that are known to change, or output by md5sum anyways
    local status="OK"
    local IGNORE_LIST="md5sum|/etc/hostname|/etc/machine-id|/etc/resin-supervisor/supervisor.conf|/etc/systemd/timesyncd.conf|/home/root/.rnd"
    checksum=$(md5sum -c --quiet /resinos.fingerprint 2>&1 | grep -v -E "$IGNORE_LIST")
    if [ -n "$checksum" ] ; then
        status="ERROR"
    fi
    echo "${status}"
}

fingerprint_check

In the case of 284708565de0b10a07721283235af17b, that doesn’t quite work, it does seem like a very slow card, and the failure to even run that fingerprint check (ie. device reboot due to the i/o on the card) is something we usually see on bad/week SD cards.

Tried to look up the card as I could check remotely, with manufacturer ID 0x000003, OEMID 0x5344, name SL16G, on https://elinux.org/RPi_SD_cards (those values we get from find /sys -name oemid | xargs -r dirname | xargs -I % sh -c 'echo Card: %; cat %/{manfid,oemid,name,hwrev,fwrev,date}'), and seems like it’s a card that many manufacturer relabels.

We usually recommend Sandisk Extreme Pro cards, as they didn’t let us down, and found them very reliable. If you are getting replacement, would strongly suggest to consider that.

ajs1k · August 14, 2019, 12:35pm

Ok, many thanks for the extensive analysis. I’ll just go ahead and treat that system to a new card as well.

imrehg · August 14, 2019, 12:51pm

Cheers, please keep us posted, we’d love to hear how things work out!

ajs1k · August 15, 2019, 9:05am

a2cbe54 got a new SD card (now 2eac0f7) and 00e0833 got a 2.5A PSU this morning. 2847085 will get a new card later today. Let’s see how things go…

ajs1k · August 15, 2019, 9:23am

In other news, the system didn’t reboot but around midnight UTC the container on e95b5b2 seemed like it was restarted. Any idea what happened there?

spanceac · August 15, 2019, 9:31am

Hi,

I can see that around the hour you mention there was a supervisor and balena daemon crash.

This restarted your container.

We will investigate what could have caused that and let you know.

spanceac · August 15, 2019, 9:43am

I rebooted your board 3 times now, just by running this command: md5sum --quiet -c /resinos.fingerprint.

This shouldn’t happen and something is clearly wrong with your device.

spanceac · August 15, 2019, 10:18am

I think the reboot problem could be related to the temperature of your RPi.

By constantly monitoring the temperature of the CPU I see that when running the md5sum command the CPU temperature increases to ~65C and then the board reboots.

Could you try adding a radiator to the CPU?

spanceac · August 15, 2019, 10:25am

I did an additional test where I stress the CPU and the temperature gets to 72C but the board doesn’t reboot.

So probably it’s not the temperature.

ajs1k · August 15, 2019, 10:27am

The Pi boards seem like a great idea, but they do have their flaws. I’ll get a new one.

In my experience, the SD cards really just don’t last very long. And I have some at home that keep falling off the WiFi for no good reason that I can tell - had to write a network watchdog that has them reboot if they no longer see the local gateway, and even then occasionally even the reboot doesn’t work and they need power cycling.

ajs1k · August 15, 2019, 10:36am

No useful information I’m afraid - I went to go and watch what was happening; the screen simply goes black and then it reboots.

spanceac · August 15, 2019, 10:40am

So I traced the issue to this file /var/cache/ldconfig/aux-cache.

Just reading this file will trigger a board reboot.

ajs1k · August 15, 2019, 10:41am

How curious. I will try reflashing the SD card.

ajs1k · August 15, 2019, 10:58am

Ok, reflashed - it’s now 2d94bb2.

ajs1k · August 15, 2019, 11:02am

No, it’s still not healthy. It crashed again about 80% of the way through the process of fetching the docker image. I’ll replace the board and have done with it.

Topic		Replies	Views
How to troubleshoot reboots? balenaOS	5	266	September 7, 2021
Trying to develop a Balena System with reliable rebooting balenaOS	10	506	September 28, 2021
balenaOS 2.80.5+rev1, supervisor version 12.8.10 keeps rebooting. balenaOS	6	628	July 15, 2021
Figure out cause of unexpected Balena Reboot after 'x' time Product support	4	243	February 24, 2022
Restart logs for services balenaOS raspberrypi4	6	163	March 7, 2024

Raspberry Pis keep rebooting

Related topics