WiFi-Connect Fails to Broadcast AP; D-BUS Failure

We’ve been working on integrating WiFi-Connect and it’s mostly been going well. However we’re finding an inconsistent, but persistent error where sometimes the wifi container will fail to broadcast it’s AP, and produces these logs:
root@balena:~# balena logs wifi_connect_5_1 Starting WiFi Connect Error: D-Bus failure: Get org.freedesktop.NetworkManager.Device::Interface property failed on /org/freedesktop/NetworkManager/Devices/18: No such interface 'org.freedesktop.DBus.Properties' on object at path /org/freedesktop/NetworkManager/Devices/18 caused by: "No such interface \'org.freedesktop.DBus.Properties\' on object at path /org/freedesktop/NetworkManager/Devices/18"

This error is difficult to reproduce, but once a device is caught in it, we need to swap out the wifi module or reflash the image to get wifi to work. No amount of container restarts/power cycles fixes it.

Here is the wifi service compose section:
wifi_connect:
build: ./wifi/
network_mode: “host”
labels:
io.balena.features.dbus: ‘1’
cap_add:
- NET_ADMIN
environment:
DBUS_SYSTEM_BUS_ADDRESS: “unix:path=/host/run/dbus/system_bus_socket”
PORTAL_SSID: “TAGe-WiFi”

The host is an Odroid C2 (custom BalenaOS 2.38/aarch64), and the wifi chip is Ralink RT5370, a recommendation from the wifi-connect README.

Please excuse the code formatting, I tried :frowning:

Hi , thanks for reaching out for support.
Please give us a little more context to the problem:

  • what exact hardware are you using ? Device, wifi adapter ?
  • what version of balenaos are you using ?
  • what do you mean by swapping out the wifi adapter ?

One thing you could try to narrow down the source of the problem is the following:
Reflashing the device should not be necessary t remove the problem. If the error is connected to state persisted in the container, stopping the container and possibly the supervisor on the device an deleting the container should do the job. This can also be achieved by pushing a new slighly changed release. Testing this would tell us if the error can somehow be located in the container rather than in balena or hardware.
Regards Thomas

Host: Odroid C2 (aarch64)
Wifi adapter: Ralink RT5370
BalenaOS: 2.38 (custom rev for unsupported device)
Swapping: Changing one RT5370 for a new one

That was all in the original, but please let me know if that’s not what you were asking for. I will try your suggestion the next time we run into this, thanks.

What you are most probably experiencing is driver issue with the dongle. The situation with WiFi drivers on Linux is not ideal. Can you please check dmesg for any WiFi related errors at the time the problem occurs? You can run it from a host OS terminal.

Hi all, co-worker of @Airum here.
I’ve done a few tests on the wifi, and the results have been strangely inconsistent as stated above. These are the logs of the wifi service from my most recent test.

Starting WiFi Connect
Deleting already created by WiFi Connect access point connection profile: "TAGe-WiFi"
Error: D-Bus failure: Get org.freedesktop.NetworkManager.Device::Interface property failed on /org/freedesktop/NetworkManager/Devices/20: No such interface 'org.freedesktop.DBus.Properties' on object at path /org/freedesktop/NetworkManager/Devices/20
  caused by: "No such interface \'org.freedesktop.DBus.Properties\' on object at path /org/freedesktop/NetworkManager/Devices/20"

I should mention that we are using the most recent version of the wifi-connect service. The only modification made was to hardcode the Dockerfile to pull the correct image/executable for our architecture (aarch64). For some reason the environment variables the original Dockerfile used were pulling the wrong versions, even though looking at the documentation it should have pulled the correct ones (but that’s a separate issue).
Looking at dmesg, I see these lines as being the most relevant:

[    5.397386] ieee80211 phy0: rt2x00_set_rt: Info - RT chipset 5390, rev 0502 detected
[    5.427759] ieee80211 phy0: rt2x00_set_rf: Info - RF chipset 5370 detected
[    5.435915] ieee80211 phy0: Selected rate control algorithm 'minstrel_ht'
...
[    6.521652] ieee80211 phy0: rt2x00lib_request_firmware: Info - Loading firmware file 'rt2870.bin'
[    6.531580] ieee80211 phy0: rt2x00lib_request_firmware: Info - Firmware detected - version: 0.36
...
[   14.918885] ieee80211 phy0: rt2x00usb_vendor_request: Error - Vendor Request 0x07 failed for offset 0x1700 with error -71
...
[   16.365390] ieee80211 phy1: rt2x00_set_rt: Info - RT chipset 5390, rev 0502 detected
[   16.395764] ieee80211 phy1: rt2x00_set_rf: Info - RF chipset 5370 detected
[   16.397554] ieee80211 phy1: Selected rate control algorithm 'minstrel_ht'
...
[   16.423152] ieee80211 phy1: rt2x00lib_request_firmware: Info - Loading firmware file 'rt2870.bin'
[   16.432748] ieee80211 phy1: rt2x00lib_request_firmware: Info - Firmware detected - version: 0.36

There were also several lines that said IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready, and one that said IPv6: ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready, despite the fact that the wifi wasn’t configured.
So it looks as if the chipset was detected and the firmware loaded, but then it ran into an error getting the vendor code from the dongle, so it started the process over again.
After checking all this, I noticed that the wifi’s AP was broadcasting, so I connected to it and gave it wifi credentials, and it connected up right away. This is also strange, as lately it has taken a few tries to get it to A) see the right network and B) connect. Checking the logs, it looks like it worked right away, and the old logs about the D-Bus error were gone.

Starting WiFi Connect
WiFi device: wlan0
Access points: [<REDACTED>]
Starting access point...
Access point 'TAGe-WiFi' created
Starting HTTP server on 192.168.42.1:80
request error = Header
User connected to the captive portal
request error = Header
request error = Header
Stopping access point 'TAGe-WiFi'...
Access point 'TAGe-WiFi' stopped
Access points: [<REDACTED>]
Connecting to access point <REDACTED>...
Internet connectivity established

It looks as if the container restarted itself (I did not set the restart parameter in the compose file) and had no issues.
I’ll be honest, I have no clue what’s going on here.

I also just had a case where I got that same D-Bus error. But this time dmesg only showed finding the chipset and getting the firmware, and not an error about getting the vendor code. In this case, the AP was not broadcast, and I had to reset the service. Then it started working, but I had to take several tries before I got it to connect to the wifi. I’ve found that if it can’t connect on the first try (even though I know I put in the right password), when I connect to the AP again, the network I tried to connect to last time isn’t in the list of networks to connect to.
I do see there is an option in the wifi-connect start.sh script to add a small sleep to wait for the wifi to connect; I’m wondering if that would help with this particular problem (probably not the D-Bus error).

This definitely looks like a WiFi dongle driver/firmware issue. My suggestion would be to buy a few other options and find out which chipset would work best on the Odroid C2. It is not easy to find a nicely working one. We have plans to do more extensive tests on different dongle models, but for now we do not have any concrete recommendations.

I have done several tests with enabling the sleep in the start script, and thus far I haven’t seen the D-Bus error anymore. I’m still having the issue where the network I want to connect to sometimes doesn’t show up in the list of networks, but that could just be due to having a glut of wireless networks in the office. At the very least, the service is starting up and running correctly, even if it takes a couple tries to connect.
I’m wondering if the wifi service was sometimes starting up before the D-Bus service was, like a race condition. I’m not sure of the ordering there.

The sleep there was left for cases where the device connects too slowly to a network. It does not match the expected pattern. The fact that the chip does not find the network sometimes verifies my assumption that the driver is not very stable. The problem with WiFi drivers on Linux is that most vendors treat the platform with much lower priority than Windows and my findings show that the majority of WiFi dongles have all sorts of issues.

I will order today a dongle like yours - it will take a week for me to get it as it will be shipped to me from one of our offices.

As for your question you may check the timestamps for the events with journalctl and dmesg -T. To filter out NetworkManager you may use journalctl -u NetworkManager.