Stuck in "Running supervisor update"

Hi,

I attempted to update the OS on my Raspberry Pis yesterday. 2 of them updated happily, the 3rd seems to have got stuck on the “running supervisor update” stage of the the process, as it hasn’t progressed even ~ 20 hours later. It’s device https://dashboard.resin.io/devices/63369c9d6d143b2567b2118b58e3f294/summary and I’ve enabled support access.

Many thanks

Phil

Hi @pjb304,

We’re not able to access the host OS of the device, but if you’re able to physically restart the device, it should come back up with the new OS installed and then update the Supervisor.

Could you let us know if that doesn’t happen, and we’ll take another look.

Best regards, Heds

Hi @hedss,

Thanks for the response. I’ve just tried power cycling the device and nothing has changed - it’s still stuck at “Running supervisor update”.

I’ve re-enabled support access for you.
Thanks
Phil

Hi, please just power cycle it once, the supervisor has to recover after reboot, so it can pull the update and check in with the resin backend well. Then the status will be updated, in the meantime it will still show the last status message, as you mentioned.

For the time being, are you manually rebooting it? If so, no need further, the device should be loading the new OS, just need to sort itself out. We are taking a look.

If you are not rebooting, just let us know, we are checking why the device is in an apparent reboot-cycle. Thanks!

Hi,

I haven’t touched the power for it since before just before I replied 25 minutes ago, so it’s doing it of its own accord.

Thanks

Hi @pjb304 we see the device is continuously rebooting, any idea why that might happen? As much as we can tell, the device is starting the correct, new OS, but after less than 1 minute it seems to hard reboot. So far we don’t see any obvious reason for that to happen, so any context would be helpful troubleshooting this.

I see, okey, we keep checking.

The device in question is powered using PoE and is sat on the roof of a building. As far as I’m aware the power supply to it is stable. As far as I know it was fine until I applied the update. If there’s nothing obvious causing it then it sounds like I need to go up (thankfully not too much of a headache to do) and see if I can observe anything that’s causing the reboots. If I don’t see anything obvious then I guess I’ll swap out the SD card and see if that improves matters.

Thanks for the help.

Hi, it all sounds okay, thanks for the extra context!

We managed to log in, and stopping the application container it stopped rebooting. Checked the device’s status, and it seems to be recovered fine. In the testing we had to remove the application image from the device, that’s why it is downloading now once again. (To be clear, in a normal update the application does not need to be redownloaded)

As much as I can tell from the log, it stopped midway the supervisor update, so it looks like the device hung in some way. That’s not something we’ve seen before. The reboot cycle seems to be connected to the application somehow, as stopping it the cycle stopped.

In general we’d not expect such issues to happen in the future.

One more note: looking at the kernel logs just now, while making sure the device is running fine, I start to see some lines like this:

Aug 10 15:03:30 63369c9 kernel: Under-voltage detected! (0x00050005)
Aug 10 15:03:36 63369c9 kernel: Voltage normalised (0x00000000)

So likely it is indeed some power related issue, at least in part.

Both comments are very interesting. The device is a LoRaWAN gateway so the application interacts with hardware on top, so if for somereason the supervisor update stops the hardware being mapped through then the application might crash out.

Thanks for pointing out the undervoltages. It looks like I’ll need to dig into this in more detail.

Many thanks for your help with this

One last (for this time) update, @pjb304, the device is up and running now, and seems to be working properly.

Good points about the hardware interaction. We are a big fan of LoRA, and then the roof makes perfect sense :slight_smile:

Please don’t hesitate to let us know if you have any further issues with this device, or questions in general!