Latest ResinOS prevents Wifi Access on RPi

I recently attempted an update to ResinOS. I had been running my Raspberry Pi 3 B on ResinOS 2.12.7 and using Resin.io to manage it. On ResinOS 2.12.7, the RPi’s wifi was working well, and Resin Wifi Connect also worked well when I tested it on the same ResinOS version.

Due to some unrelated issues, I chose to flash a fresh ResinOS image for this RPi, and used the ResinOS version 2.13.6, enabling Wifi access with the correct network and password. On flashing that ResinOS version to my SD card, inserting as usual into my RPi, and restarting the RPi, the RPi did not appear on my Resin.io dashboard.

When I used a wired internet connection, the RPi did connect successfully and showed up on the ResinOS dashboard. I let it download and install my containers, and then attempted to switch back to Wifi. It did not connect.

Since I have Resin Wifi Connect installed and running in one of my containers, I used it to reassign the network and password. The RPi still did not successfully connect. I then attempted using Resin Wifi Connect to connect to a different network. That also did not work.

I flashed a ResinOS 2.12.7 image to my SD card with the appropriate Wifi network and password, and the RPi immediately connected via Wifi and showed up on the Resin Dashboard. That confirms that this is not a hardware issue. (The tunneling socket hang-up errors did resume, however).

I next attempted to update to ResinOS 2.13.6 from 2.12.7 from the Actions tab of the Device, as outlined in https://docs.resin.io/reference/resinOS/updates/self-service/. I received the message that it had successfully updated, closed the ‘update’ window, and saw that the device was offline.

Something about the new ResinOS update (2.13.6) seems to be interfering with the RPi’s Wifi access. Any ideas what is causing this? (Temporarily, I can just stick with 2.12.7, but I would like to be able to update in the future). Is there something I can do on my end to fix? Or, ResinOS team, can you fix it on your end?

(Also, are there ChangeLogs available for the 2.12.7 -> 2.13.6 update?)

Hi,

You can find the changelog for changes from 2.12.7 to 2.13.6 here: https://github.com/resin-os/meta-resin/blob/master/CHANGELOG.md

WiFi is part of our test suite whenever we release a new resinOS version. I’ve reached out to our resinOS team and we should get back with an answer about this.

Thanks for the report, please let us know if you have any updates or questions in the meantime.

@KLForsythe I tried to reproduce this today, but WiFi connection was reestablished properly after the host OS updates.

Maybe you can help me with this - after reproducing this on your side, please enable Ethernet so that connectivity is established. Then share a device dashboard link with me with provided support access in a private message and I will investigate further by looking at the host OS logs. Will that work for you?

Thanks @majorz. I really appreciate your help with this. I have set up a device specifically for your troubleshooting, and will be sending you the link via private message shortly.

Awesome, thanks a lot!

Hi @KLForsythe, let’s continue the discussion here on the original thread now.

I found partially the reason. When the device is rebooted the wireless connection is not established by the time the WiFi Connect container starts. This makes the device enter AP mode with WiFi Connect running, so it cannot connect to the proper network.

Kind of workaround is to sleep in the beginning of the start script of wifi-connect, e.g. sleep 30. This way the execution of wifi-connect will be delayed for 30 seconds and the device should have enough time to connect to the right network.

Another thing you can dois to specify the --activity-timeout command line argument when running wifi-connect. For example wifi-connect --activity-timeout 180 will make wifi-connect exit automatically after running for 3 minutes if credentials are not entered by a user. After it exits, the wireless device will connect automatically to the proper network.

I am going to investigate further tomorrow why the device did not connect during the time frame before wifi-connect was started. Usually the amount of seconds before the wifi-connect container is started are enough to get the device connected to the proper network.

Please let me know if you have any questions.

Hi @majorz

Thanks so much! Very interesting. I will add the sleep command to the start of the script and will let you know how it affects the situation.

I am, however, puzzled about a few things:

  • The failure to connect to Wifi occurs even with a fresh ResinOS image, when WifiConnect isn’t even on the RPi yet.
  • We are already using activity_timeout when we start WifiConnect. (And I did wait more than 10 minutes
    we have the timeout set to during some of my troubleshooting attempts prior to posting in the forum)
    our command for starting wifi-connect: ./wifi-connect --portal-listening-port 45454 --activity-timeout 600

Then this explains why wifi-connect started when it should not have. I will investigate the log entries carefully tomorrow. They are quite verbose and it takes a bit of time, but we should be able to find out why the wireless connection was not established. Thanks for the feedback!

Hi @KLForsythe, the debug logs from the previous run did not provide enough information to track down the issue. I will need again your assistance and I would like to ask you to create a new application containing the following Dockerfile and start script only, so that I can diagnose the issue in more isolated manner. These are the steps:

  1. Create a new application, but do not push any code yet until you reach step 5 below.

  2. Download the OS image, but do not specify WiFi credentials on the Download page.

  3. Burn the image and start the device.

  4. Open Host OS terminal session and connect the device to your WiFi network: nmcli d wifi connect SSID password PASSWORD, where you replace SSID and PASSWORD accordingly.

  5. Commit the Dockerfile and start script at the bottom of the post and git push them to your device.

  6. After the application container is downloaded, reboot the device with Ethernet cable unplugged.

  7. The device will reboot and start the application container. The application will run for a couple of minutes and sleep infinitely at the end.

  8. After the reboot from step 6 wait three or more minutes and plug in the Ethernet cable (assuming the device has not shown in the Dashboard as online, e.g. issue is reproduced).

  9. Enable support access on the device and send me a dashboard link in PM for further investigation.

Please let me know if you have any questions and thanks one more time :slight_smile:


I. Dockerfile:

FROM resin/raspberrypi3-debian:buster

RUN apt-get update \
    && apt-get install -y dbus \
    && systemctl mask dbus.service \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/*

ENV DBUS_SYSTEM_BUS_ADDRESS unix:path=/host/run/dbus/system_bus_socket

WORKDIR /usr/src/app

COPY start.sh .

CMD ["bash", "start.sh"]

II. start.sh

#!/usr/bin/env bash

echo 'Sleep 60'

sleep 60

echo 'Enable debug logging'

dbus-send --system --print-reply --dest=fi.w1.wpa_supplicant1 \
    /fi/w1/wpa_supplicant1 org.freedesktop.DBus.Properties.Set \
    string:fi.w1.wpa_supplicant1 string:DebugTimestamp variant:boolean:true

dbus-send --system --print-reply --dest=fi.w1.wpa_supplicant1 \
    /fi/w1/wpa_supplicant1 org.freedesktop.DBus.Properties.Set \
    string:fi.w1.wpa_supplicant1 string:DebugLevel variant:string:"msgdump"

dbus-send --system --print-reply --dest=org.freedesktop.NetworkManager \
    /org/freedesktop/NetworkManager org.freedesktop.NetworkManager.SetLogging \
    string:"debug" string:""

sleep 10

echo 'Disconnect WiFi device'

DEVICE_PATH=$(dbus-send --system --print-reply --dest=org.freedesktop.NetworkManager \
    /org/freedesktop/NetworkManager org.freedesktop.NetworkManager.GetDeviceByIpIface \
    string:"wlan0" | grep "object path" | cut -d '"' -f2)

dbus-send --system --print-reply --dest=org.freedesktop.NetworkManager \
    $DEVICE_PATH org.freedesktop.NetworkManager.Device.Disconnect

sleep 10

echo 'Activate WiFi device'

dbus-send --system --print-reply --dest=org.freedesktop.NetworkManager \
    /org/freedesktop/NetworkManager org.freedesktop.NetworkManager.ActivateConnection \
    objpath:"/" objpath:$DEVICE_PATH objpath:"/"

sleep 60

echo 'Disable debug logging'

dbus-send --system --print-reply --dest=fi.w1.wpa_supplicant1 \
    /fi/w1/wpa_supplicant1 org.freedesktop.DBus.Properties.Set \
    string:fi.w1.wpa_supplicant1 string:DebugLevel variant:string:"info"

dbus-send --system --print-reply --dest=org.freedesktop.NetworkManager \
    /org/freedesktop/NetworkManager org.freedesktop.NetworkManager.SetLogging \
    string:"info" string:""

echo 'Infinity sleep'

sleep infinity

@majorz
Thanks again for your help with this.

I followed the steps you outlined above, with some interesting results.

When setting the wifi credentials with nmcli d wifi connect SSID password PASSWORD, I encountered a timeout several times. I then attempted to connect to a different Wifi network, and it succeeded.

root@8098331:~# nmcli d wifi connect SSID1 password PASSWORD
Error: Timeout 90 sec expired.
root@8098331:~# nmcli d wifi connect SSID2 password PASSWORD
Device 'wlan0' successfully activated with 'xxxxx'.

After seeing wlan0 successfully activated, I continued with the steps you outlined, and the issue did NOT replicate; on rebooting without the Ethernet cable attached, the device did connect to the wifi network (SSID2).

However, connecting to the original network (SSID1) did work on the previous ResinOS (version 2.12.7). So this seems to be related to the combination of network and ResinOS version. (As a side note, I am working with Raspberry Pi 3 B+, so both 5G and 2G networks should be compatible).

I have enabled support access, and will PM you a link.

Thanks again!

Hi @KLForsythe, I investigated the debug logs from yesterday for the failing (OS v2.13.6) and non-failing (OS v2.12.7) connection attempts. In 2.13 we switched the NetworkManager’s DHCP plugin from dhclient to the internal systemd one and the libsystemd DHCP client does not work for this particular network. We will be looking further into how to resolve this and we will get back to you with more information soon. Thanks a lot for the assistance!

Hi @majorz, Thanks so much for your help, and for the update. I really appreciate it, and look forward to hearing the results of your analysis. If you need any more information or assistance from me, just let me know.

Hi @KLForsythe, I am going to run now a few diagnostic tests on the device with v2.12.7 that you provided support access yesterday. I will let you know how that goes in a bit.

Hi @KLForsythe, there were not any obvious issues with the DHCP diagnostics I run on the device. Only that the response from the DHCP server was a bit slow - taking a few seconds. We will ping you in case we need more assistance from you.

Update: this is a firmware issue with the Cisco RV325 router. Updating the router firmware fixes the problem.