Fin goes offline after a few hours of use

I am looking for a way to better diagnosis why I have a fin that keeps going offline in the middle of the night.

I know it isn’t connected to the wifi network because I checked the router and tried to ssh indirectly.

I have it here in my office so I can restart it and connect to it directly, but I need to figure out why this is happening between my current version of the code and not in the last version.

Some things I added:
I unmanaged eth0 in the network manager to avoid it continuously scanning for an ip. This is only an issue because of my custom power delivery here.

command used:
'nmcli device set ' + adapter + ' managed no'

The relevant section of the docker file:

FROM balenalib/raspberrypi3-debian-python:3.6.8-buster-build
RUN apt-get update && apt-get install -y \
network-manager=1.14.* \
tcpdump=4.9.* \
openssh-client=1:7.* \
iw=5.0.* \
net-tools=1.60* \
wireless-tools=30* \
sshpass=1.06* \
procps \
&& rm -rf /var/lib/apt/lists/* && systemctl mask NetworkManager.service &&\
apt-get clean

ENV DBUS_SYSTEM_BUS_ADDRESS=unix:path=/host/run/dbus/system_bus_socket

The device is primarily connected via wifi normally so I don’t see why this was an issue and I did this during the start of my container which then ran for serval hours before there was a problem.

My application itself is fairly simple. It uses tcpdump to monitor 3 network interfaces that I rename with a udev rule, and un-monitor in the same way.

The only new thing I added was more multithreading.

I am looking at what I am doing with the network manager because I know that is most likely the root cause of any issues, but is there anything else I can look at.

Also is there any way to connect to it via a serial connection, maybe over USB? I think I saw that mentioned somewhere, but I can’t find anything on it.

Hi @taclog , yup I think the best way to diagnose this is to get it hooked up to serial. I always use serial on my fin when messing around with GSM modems. If you use a .dev build of the OS, serial is exposed on the pin headers exactly as it is on an RPI. I use the PiUart for connecting it up but any serial cable will do. You can then just log in using username: root and there is no password.

From there you can check NetworkManager logs and also probably look at dmesg to see if the wifi driver is failing or something like that.

Alright,

I don’t want to have to reproduce this issue, so I am going to leave it as is until mine comes.

Thanks for confirming that and pointing me in the right direction. The fin is my first exposure to this kind of computing, so I wasn’t sure this was an option.

I will update this post with what I find for the benefit of others.

-Thomas

Great @taclog , let us know if we can help as you work through it :slight_smile:

Updates:

Unfortunately, after I unplugged a WiFi adapter the application restarted and it came back online.

I still have no idea what happened, so further ideas though. I was logging to a /tmp/ directory which was established in my docker-compose.yml as tmpfs: /tmp

This seem like it would slowly fill the memory until there is none left. I don’t think it had been running long enough to do this however.

It seems this may forever remain a mystery until it undoutably happens in production. So I am adding more robust logging so that in the even it happens, and we recover the device I should have more to go off of.

Thanks for your suggestion @shaunmulligan I will have that on hand for next time if I manage to reproduce this bug.

-Thomas

Hey @taclog let us know what you find or if you need help chasing this down with the logs. What you said about tmpfs makes sense, you could try periodically checking on the available space, just to rule out this is the case for the crashes.