Raspberry Pis keep rebooting

Cheers! A poor power supply can be a catalytic factor to sd corruption as well. Best regards!

Ok, a2cbe54 and 2847085 have the official raspberry 5V/2.5A PSUs.

2847085 will get a replacement SD card (how did you detect the fault?)
00e0833 has a 2A PSU so I’ll order a new one for it - the under-voltage messages are visible.

I don’t know what I can do for a2cbe54 - any ideas? I didn’t see any under-voltage message on that one.

a2cbe54 has this to say for itself - is there any place else that we could look to find out why it just spontaneously rebooted ?

Aug 14 08:43:00 a2cbe54 df7643779caa[26320]: Supervisor API: GET /v1/healthy 200 - 167.548 ms
Aug 14 08:43:00 a2cbe54 resin-supervisor[26773]: Supervisor API: GET /v1/healthy 200 - 167.548 ms
Aug 14 08:44:48 a2cbe54 resin-supervisor[26773]: Attempting container log timestamp flush…
Aug 14 08:44:48 a2cbe54 resin-supervisor[26773]: Container log timestamp flush complete
Aug 14 08:44:48 a2cbe54 df7643779caa[26320]: Attempting container log timestamp flush…
Aug 14 08:44:48 a2cbe54 df7643779caa[26320]: Container log timestamp flush complete
– Reboot –
Aug 14 08:47:32 a2cbe54 systemd-journald[464]: Time spent on flushing to /var is 35.661ms for 479 entries.
Aug 14 08:47:32 a2cbe54 systemd-journald[464]: System journal (/var/log/journal/3745cc0eee3b49568d92f8099509336d) is 8.0M, max 8.0M, 0B free.
Aug 14 08:47:33 a2cbe54 resin-persistent-logs[654]: resin-persistent-logs: Persistent logging activated.

Just checking the device now, and get back to you in a bit.

It must not like being watched:

Aug 14 11:22:03 a2cbe54 resin-supervisor[1316]: Supervisor API: GET /v1/healthy 200 - 8.783 ms
Aug 14 11:26:01 a2cbe54 systemd[1]: Starting OpenSSH Per-Connection Daemon (52.4.252.97:54680)…
Aug 14 11:26:01 a2cbe54 systemd[1]: Started OpenSSH Per-Connection Daemon (52.4.252.97:54680).
– Reboot –
Aug 14 11:27:18 a2cbe54 chronyd[652]: 2019-08-14T10:53:35Z Selected source 81.94.123.16
Aug 14 11:27:18 a2cbe54 chronyd[652]: 2019-08-14T10:53:35Z System clock wrong by 2023.170439 seconds, adjustment started

Hey, been looking at the device since the last message, and nothing stands out. Looked the logs over the last two reboots as well, and the reboot seems to be sudden (no sign of what might be happening, as it would be with software reboot or watchdog reboot in general), that suggest power related issue, e.g. power dropping out, or something similar. You mention that that device (a2cbe54) has an official power supply? Would worth trying to switch the power supply around, just in case, or see if there’s any problem with the wall socket/wherever the PSU itself is being plugged in.

It just rebooted now, and checking the logs again doesn’t show anything that would have caused this reboot softwarewise :confused:

The storage corruption we check by comparing the rootfs with a fingerprint file shipped. Just FYI this is one script we use to check that ignores files that are known to change sometimes, and just report back if something unexpected changes:

fingerprint_check() {
    # Check fingerprint and ignore lines that are known to change, or output by md5sum anyways
    local status="OK"
    local IGNORE_LIST="md5sum|/etc/hostname|/etc/machine-id|/etc/resin-supervisor/supervisor.conf|/etc/systemd/timesyncd.conf|/home/root/.rnd"
    checksum=$(md5sum -c --quiet /resinos.fingerprint 2>&1 | grep -v -E "$IGNORE_LIST")
    if [ -n "$checksum" ] ; then
        status="ERROR"
    fi
    echo "${status}"
}

fingerprint_check

In the case of 284708565de0b10a07721283235af17b, that doesn’t quite work, it does seem like a very slow card, and the failure to even run that fingerprint check (ie. device reboot due to the i/o on the card) is something we usually see on bad/week SD cards.

Tried to look up the card as I could check remotely, with manufacturer ID 0x000003, OEMID 0x5344, name SL16G, on https://elinux.org/RPi_SD_cards (those values we get from find /sys -name oemid | xargs -r dirname | xargs -I % sh -c 'echo Card: %; cat %/{manfid,oemid,name,hwrev,fwrev,date}'), and seems like it’s a card that many manufacturer relabels.

We usually recommend Sandisk Extreme Pro cards, as they didn’t let us down, and found them very reliable. If you are getting replacement, would strongly suggest to consider that.

Ok, many thanks for the extensive analysis. I’ll just go ahead and treat that system to a new card as well.

Cheers, please keep us posted, we’d love to hear how things work out!

a2cbe54 got a new SD card (now 2eac0f7) and 00e0833 got a 2.5A PSU this morning. 2847085 will get a new card later today. Let’s see how things go…

In other news, the system didn’t reboot but around midnight UTC the container on e95b5b2 seemed like it was restarted. Any idea what happened there?

Hi,

I can see that around the hour you mention there was a supervisor and balena daemon crash.

This restarted your container.

We will investigate what could have caused that and let you know.

I rebooted your board 3 times now, just by running this command: md5sum --quiet -c /resinos.fingerprint.

This shouldn’t happen and something is clearly wrong with your device.

I think the reboot problem could be related to the temperature of your RPi.

By constantly monitoring the temperature of the CPU I see that when running the md5sum command the CPU temperature increases to ~65C and then the board reboots.

Could you try adding a radiator to the CPU?

I did an additional test where I stress the CPU and the temperature gets to 72C but the board doesn’t reboot.

So probably it’s not the temperature.

The Pi boards seem like a great idea, but they do have their flaws. I’ll get a new one.

In my experience, the SD cards really just don’t last very long. And I have some at home that keep falling off the WiFi for no good reason that I can tell - had to write a network watchdog that has them reboot if they no longer see the local gateway, and even then occasionally even the reboot doesn’t work and they need power cycling.

No useful information I’m afraid - I went to go and watch what was happening; the screen simply goes black and then it reboots.

So I traced the issue to this file /var/cache/ldconfig/aux-cache.

Just reading this file will trigger a board reboot.

How curious. I will try reflashing the SD card.

Ok, reflashed - it’s now 2d94bb2.

No, it’s still not healthy. It crashed again about 80% of the way through the process of fetching the docker image. I’ll replace the board and have done with it.