Device offline for over an hour

Hi,

One of our remote devices has gone offline this morning. We last received a message at 8:04am. I’m just wondering if there is any sort of way that we can diagnose what may have caused this? I had to turn off device logs due to the fact that it used around 1.75GB of data in 4 days, we have another device that is connected to the same network that is online.

These devices are on the other side of the country to me so there is no way to get to them to reset them etc.

Can anyone help?

Hi @GregorR1,

Are you still experiencing this issue? Have your devices come back online?

If they are still offline, my next question is: Do you have another but online device on the same network as this offline device?

We could use such an online device as a proxy to connect to the offline device.

Finally, if you could pass us the UUID (seen on the device dashboard url), we could check the logs on our end to see if there anything useful (if UUID is sensitive for some reason, like you enabled public device url, feel free to pass it to us via private message here).

@gelbal Hi, our device is still offline. We do have another device that is online on the same network.

The UUID of the offline device is 945f7f8a2e99218efddae054005ccbcb. I have enabled support access to our application.

@GregorR1 hello again. I used the online device under the same application to connect to the offline device but I’m not able to connect. I checked our VPN logs coming from the offline device and I only see the last disconnect event. So there is not much we could see on our end.

At this point, we need physical access as the device looks completely offline. Is there a chance to retrieve the device somehow?

Or a power cycle might bring it back. Then we might lose the logs if there is no persistent logging enabled.

The device is currently on a rooftop across the country - if we need to manually retrieve it then it could be a few days minimum. When we go to retrieve it, we will swap the sd card and bring the current one back to our office where we can bring it online and try and see what caused this.

Sounds like a good plan. Let us know how it goes and if we could help further once you have the logs. Cheers…

Hi @gelbal, the device appears to have brought itself back online last night. Would there be any way to tell what caused it to go offline in the first place?

Thanks,
Gregor

Hi, did it come back online just now since October 22nd?
If you enable persistent logging, we might be able to check the logs (after enabling support access and sharing the device dashboard link) if this ever happens again.
Be aware though that if this doesn’t happen, the logs will just accumulate and you’re better off keeping it disabled.

Yes it’s just came back online after about 2 weeks. I’ll enable support access and share the dashboard link. If you’d be able to take a look that would be brilliant.

Sorry one more thing - I didn’t mention that persistent logging is useful in case the device reboots (as logs are then cleared) - if it didn’t reboot, we can still take a look with just support access. Don’t enable persistent logging for now

Hi, I’ve granted support access. The UUID is 945f7f8a2e99218efddae054005ccbcb

Hi, I’ve had a look at this device and our logs and I can’t see anything that stands out (the device logs specifically are empty). If you enable persistent logging we may be able to further diagnose if this happens again but for now we cannot tell much.

Alright, no worries. Does the persistant logging just log any events to the devices local storage? We had an issue with the Logs using a very large amount of data.

Persistent logs store the last 8MB of journal to the device’s state partition (although this is due to change in a future release to 32MB on the data partition). BalenaOS should never fill up your filesystem with logs, if it has done so this is a bug which we will need to look into.

Ahh right I understand. The issue I had was to do with the dashboard logs, not the persistent logs. Sorry for the misunderstanding.

Ah I see, you can set RESIN_SUPERVISOR_LOG_CONTROL=false to disable log delivery if bandwidth is a concern.

There are actually a few options available to reduce bandwidth consumption, more information is available in our docs here.

Thanks for the URL, I think that’s the one I read before. I had gotten help at the time and managed to fix it - I just got confused between Dashboard Logs and the Persistent Logs you were talking about. I’ll turn on the persistent logs justnow as it would be very helpful to be able to work out what caused any particular device to go offline.

Hi All,

The device appears to have gone offline again last night. I have persistent logging enabled this time so hopefully we will be able to tell what has gone wrong! I’ve enabled support access to my application if anyone would be able to take a look and see if they can access it through the other device on the network.

The UUID of the offline device is 945f7f8a2e99218efddae054005ccbcb

Thanks,
Gregor

Hi Gregor, I tried to ssh into the offline device from the one that is online on the same network without success. This might mean the device is truly disconnected from the network or offline, so we can’t do much for now. In case the device comes back up, we will be able to investigate the logs at that point.