RPI 3 - Wifi connection unstable

Hello,

Currently, I am trying to run a simple Node.js script on multiple raspberry pi’s. These are located on different locations. I am having issues with wifi connectivity. The devices properly connect after they are powered. They stay on for the first couple of hours. For some pi’s this is 5 hours, for some 2 days. After that, they disconnect from the wifi network. They are no longer connected in my dashboard which makes remote debugging difficult.

At first, I thought the issue was that it did not reconnect after the wifi went down (e.g. router restart or something). But when I unplugged my network, turned it back on and waited a while it came back online.

Then I figured it might be the wifi-connect module. The issue occured more for people that set up through hotspot mode. However, after handing out new devices that had the wifi configured from in the image, the issue was still there.

I’m not doing anything special. The issue even occurs when I run the simple express example.

Anybody any suggestions? Also suggestions on how to debug this would be helpful.

Here is what you can do:

  1. Enter a terminal session with the host OS of the device that experiences the issue

  2. Create the following shell script file in the data partition (e.g. /mnt/data/resin-data/pull-logs.sh ):

#!/bin/bash

dbus-send --system --print-reply --dest=fi.w1.wpa_supplicant1 /fi/w1/wpa_supplicant1 org.freedesktop.DBus.Properties.Set string:fi.w1.wpa_supplicant1 string:DebugTimestamp variant:boolean:true

dbus-send --system --print-reply --dest=fi.w1.wpa_supplicant1 /fi/w1/wpa_supplicant1 org.freedesktop.DBus.Properties.Set string:fi.w1.wpa_supplicant1 string:DebugLevel variant:string:"msgdump"

dbus-send --system --print-reply --dest=org.freedesktop.NetworkManager /org/freedesktop/NetworkManager org.freedesktop.NetworkManager.SetLogging string:"debug" string:""


while true
do
  journalctl --since "$(date --date='-1 minute' +"%F %T")" | tail -n +2 >> $1

  sleep 1m
done
  1. Change it to executable mode: chmod +x pull-logs.sh

  2. Run it with nohup, so that when you exit the terminal session it keeps running: nohup ./pull-logs.sh logs.txt &

Please note that if the device is restarted the script has to be run again.

After you reproduce that get the device back in online state. You may send the logs to me then in a private message, so that I take a look at them.

thanks for the quick reply. I’m running that now. As I mentioned, it is unpredictable after how long the issue occurs. I’ll report back when it does.

Thanks for sending the logs over. I will attach them here for reference
now as they do not contain any sensitive information: logs2.log (213.1 KB)

The problem happens when the DHCP lease expires and the IP address has to be renewed.

Can you please give me more information about the specifics of your hardware setup? What RPi 3 version are you using? Also I noticed you use Bluetooth, etc. Also which Resin OS version is this on?

The Resin OS version is: Resin OS 2.15.1+rev1
The raspberry pi model is: Raspberry PI 3 Model B+

Thanks for providing this information! I will start trying to reproduce this on Monday when I get back from the weekend. I will keep you posted.

Hi, I could not reproduce this issue locally yet. I have a couple of more ideas that I would like to try and will let you know soon how that goes.

I think I exhausted all local options for reproducing this. I will need some of your assistance with going further. I think what will work best is having a minimal application that leads to reproducing the issue, e.g. if you can start with an empty application and then add your code/configuration related to connectivity, where the point with reproducing this is reached. You may post this on GitHub or post it here and I can start investigating on my side. How does that sound to you?

Thanks for all the time you put into this so far. Ill just make a simple app that does a post request every few hours. I’ll take out the Bluetooth parts to see if the issue persists.

I would like a proper solution, and I’m happy to help if this also makes resin better. Just curious, do you have ideas for a more temporary (hack) fix? I can check if there is no connection and restart wifi? Then I can try options in parallel.

I’ll do this and report back. I’m also happy to share my code but it requires a connected device

Hi, yes, I think taking out the Bluetooth pars is a good idea. My suspicion is a wireless driver/firmware bug, and since the wireless chip handles both the WiFi and Bluetooth, taking the Bluetooth out of the equation is the first thing we need to do.

As temporary fixes there are a few options I can think of:

  1. reloading the WiFi driver with modprobe -r and modprobe.
  2. restarting the NetworkManager service. This can be done through the D-Bus API systemd exposes:
DBUS_SYSTEM_BUS_ADDRESS=unix:path=/host/run/dbus/system_bus_socket
dbus-send --system --print-reply --dest=org.freedesktop.systemd1 /org/freedesktop/systemd1 org.freedesktop.systemd1.Manager.RestartUnit string:"network-manager.service" string:"fail"
  1. rebooting the device through the supervisor API: https://docs.resin.io/reference/supervisor/supervisor-api/#post-v1-reboot

I pinged you in a PM for looking at your code on a connected device.

I looked into this with @sanderb today and everything points to a driver/firmware issue (as seen by the driver crash logs above as well). There is a possibility this is fixed in the new driver/firmware release by Cypress:

* 43455
   * --- 7.45.173 ---
   * WPA3-personal support
   * Firmware crash fix
   * Low Tx duty cycle support
   * SoftAP association fix
   * WIFI-BT coex throughput improvement
   * MFP bug fix
   * Roam time enhancement
   * --- 7.45.165 ---

Our currently released version in production (Resin OS 2.15.1+rev2) uses 7.45.154.

Tracking issue is https://github.com/resin-os/resin-raspberrypi/issues/252