Devices show green, but no ability to connect

Folks, this is such a non-specific issue I feel a bit lazy even asking - but my “demo day” is fast approaching and I’m having anxieties :slight_smile: I’m running a small “garden” of Odroid C1+ devices, most connected with WIFI and one connected with hardline. All of them begin life fine, but then I seem to get into a state where the devices are green on the dashboard, but the log window does not update, I can’t reset them, etc. Trying to connect a terminal results in:

Connecting to 6a2a9a570b4070924d1e96426c82b417552d59c8c5241ac6d0b4cb4458677f…
Spawning shell…
SSH session disconnected
SSH reconnecting…
Spawning shell…
SSH reconnecting…
Spawning shell…
SSH reconnecting…

With the connection never completing. Cycling the power on the devices has no effect. At first I’d blamed WIFI connectivity for this, but the hardline device just got into the same state as well. Despite the “green”, the devices seem “lost”.

Is there a troubleshooting tree I should be following to recover the devices?

Thanks.

Hi @bbargen, is it okay if we connect to the device, and look into what’s going on? Your hints are pointing towards supervisor issues (not being able to connect over the console, no logs, not listening to commands).

Sure, thanks for taking a look… Cheers, -bb

Ah, and in the “surely I must be doing something stupid” department, here’s another example, running a different app…

Connecting to b554a43aedfa9332c25a427b6a3e02667f732a2c7ebef4fa78861a7a6530a7…
Spawning shell…
SSH session disconnected
SSH reconnecting…
Spawning shell…
SSH reconnecting…
Spawning shell…
SSH reconnecting…
Spawning shell…

hey @bbargen, can you please send us the link to your devices so we can have a closer look into the issue (we might need to ssh into your device)?

Hi @bbargen,
We’ve looked at the device in question and found corruption of the filing system that was causing the application to fail to run.
We’ve now reprovisioned your device, and it’s now running your application.
Please let us know if we can be of any other assistance!

Hey @bbargen, fixed up the second device as well, the same issue as the first one, sorry, it slipped through the cracks first.

Thanks folks for resolving so quickly! Is there something I can do to work around that issue myself, or something I’m doing that is possibly aggravating things? I ask because at some point, my devices seem to end up in a state that looks similar (for example, right now, b900da04b8e95b888bc8c4e8099a9c92e3aba108b70a49848bece061f0f166 seems to be stuck “updating” - it tried for awhile, then it went offline, and when I rebooted it started the update again.) Note sure if that is the same root cause… Thanks, -bb

Hey @bbargen is it okay if we try to restart the device? It does seem like a cluster of btrfs issues, and thinking how to mitigate it for you. I think it might help using a larger SD card maybe, but that’s a hunch working with BTRFS so far. For this last device, I think getting a new card would help for sure.

As for what can be done in practice to limit these sorts of issues, it’s not easy to give a general advice at this point, mostly because fewer people run these devices than the Raspberry Pis and thus have less experience with them in production. Will keep you posted in case there are some more suggestions from the team.

Hi resin.io team-- I’m experiencing the same problems (device appears online, no logs, can’t reboot from the console, terminal repeatedly attempts and fails to spawn shell, etc) but on an Intel NUC device. Is there an update on the troubleshooting method for this issue? @imrehg @hedss

Hi @trish, will send you a PM to debug further.