Unreliable wifi connection

Hello all,

Have a small fleet (18 RPi Zero W) that are connected to a wifi network (actually there is 4 different login/psk).

Some RPi lose connection and cannot reconnect, even though a RPi nearby is connected just fine. Usually, if I add a hotspot w/ some other wifi that is currently not showing, it does connect.

What I wanted to do is:
- Check the status of the wifi (connected or not);
- Somehow reset the wifi if not connect (so it start to looking for connections afresh, similiar to a complete reboot)

I thought about using ifconfig wlan0 to check connections and ifconfig wlan0 down && ifconfig wlan0 up to reset it.

But I think it would only work in Host, not in containers (even w/ privileged=true).

We use NetworkManager on balenaOS. You can have a look here on how to enable dbus communication to the host here: https://www.balena.io/docs/learn/develop/runtime/#dbus-communication-with-host-os
Then you should be able to use nmcli in the container and do the introspection / network manipulation using that.

Thanks, I have enabled dbus and I’m playing around w/ python NetworkManager (seems better option than directly interacting w/ terminal).

I’ve found how to check the state of connections but have not found how to reset them, could share some idea?

Do you mean something like nmcli dev wifi rescan ?

Seems not. As I understood, NetworkManager automatically scans already.

But I’m in a situation where sometimes the device is disconnected even when a perfect wifi connection is available and the device have the auth to connect.

These times, always a reboot resolves the issue but I wanted to “reboot” only the connection… (The reason this problem occurs is not clear yet)

Hi,
What balenaOS version are you using on those RPis? Are you using any external WiFi dongle?
You should be able to restart the NetworkManager as a whole with the following dbus command:

DBUS_SYSTEM_BUS_ADDRESS=unix:path=/host/run/dbus/system_bus_socket dbus-send --system --print-reply --dest=org.freedesktop.systemd1 /org/freedesktop/systemd1 org.freedesktop.systemd1.Manager.RestartUnit string:'NetworkManager.service' string:'fail'

Give it a try and let us know whether it works for you.

1 Like

I will not be able to tell if it fixed the problem but the commands runned, output bellow:

method return time=1574103572.058136 sender=:1.1 -> destination=:1.80357 serial=2016701 reply_serial=2 object path "/org/freedesktop/systemd1/job/257564"

HOST OS VERSION
balenaOS 2.44.0+rev1

SUPERVISOR VERSION
10.3.7

The command I use to do what you want to do is:

nmcli c up CONNECTION_NAME

To find the connection name, run
nmcli c show

In my application, I spin off a python thread that is entirely devoted to waiting a set interval, checking for internet below and on failure, running the above (which if run with os.system waits on the command) then checking again to confirm it works and logging it for statistics.

My interval is every 20 seconds and my devices will “soft-reset” their network manager 20 to several hundred times a day. I suspect this has something to do with the adapter I am using.

def internet(host="8.8.8.8", port=53, timeout=3):
    """
    source:
    https://stackoverflow.com/a/33117579/11116438
    Host: 8.8.8.8 (google-public-dns-a.google.com)
    OpenPort: 53/tcp
    Service: domain (DNS/TCP)
    """
    try:
        socket.setdefaulttimeout(timeout)
        socket.socket(socket.AF_INET, socket.SOCK_STREAM).connect((host, port))
        return True
    except Exception as ex:
        print(ex)
        return False

It seems to remind the network manager that it can and should connect to a connection it chooses to stop connecting to. Note that why it stops in the first place remains unknown to me.

Restarting the entire network manager service was never an option for me because it seems to forget that unmanaged devices exist after such a restart.

To access the nmcli in a container you have to follow the instruction in the networking documentation balena has.

Good luck and if you ever find out more about whats going on let me know.

-Thomas

Thank you very much @tacLog .

Since I have several possible connections, don’t want to manually set one of the these (and check if it is working or not and etc).

What problems do you see w/ “it seems to forget that unmanaged devices exist after such a restart”?

I’m using a logic simular as yours to detect if is connect but thought about using nmcli radio wifi off && sleep 5 && nmcli radio wifi on to reset the connections…

Hey @deoqc

It makes sense you don’t want to mess with connections if you don’t know which one is active. You could as a last result try to set each up in turn and text between each. In addition, nmcli reports success or failure that you could parse.

As for the problems with restarting it. If you un-manage a device (nmcli dev set wlan9 managed no) then network manage no longer touches that device. If you then restart network manager, that device disappears from nmcli d s and can’t be manged again for usage. This is only relevant if you want to do other things with your adapters like monitor mode.

nmcli radio wifi off sounds like it would work to me, but I have never tried it.

My favorite manual for nmcli doesn’t say much about it.

It says that nmcli connection up ifname "$DEVICE" is a valid command and would avoid you having to choose which connection you want to use. You just have to choose what adapter to use, which in your case is probably just wlan0

Also if you just have one adapter, you could always just un-manage and manage it again. That would have to reset network manager.

Let me know what ends up working for you. I would love to learn more in-case we end up deploying on RPi Zw’s in the future. (They were second on our short list)

-Thomas