Balena WiFi Connect issue

I am using WiFi Connect as requested in my docker-compose.yml.

wifi-connect:
image: balenablocks/wifi-connect:amd64
restart: always
network_mode: host
privileged: true
labels:
io.balena.features.dbus: "1"
io.balena.features.firmware: "1"

The only ENV variable that I set is PORTAL_SSID just to recognise the Access Point name.

The repeating time to check for internet connectivity of the Balena WiFi Connect block is 120 seconds (= default value). And this is exactly what happens. Every 2 minutes the device checks for internet connectivity. Until the moment where the problem occurs !

At power-on, the Access Point gets created and I can connect by giving the SSID and PW on the captive portal.

Everything seems to work until now.

→ However, after a certain time (couple of minutes - or half a day) - after a completely random amount of time, THE INTERNET CONNECTION FAILS ! The Wifi Connect does no longer try to connect. It stalls for some unknown reason ! The device goes offline and is lost.

My questions:

  1. Why does the Wifi Connect fail after a random amount of time ?
  2. Why exactly happens the failure ?
  3. Even if there is the un’imaginable case where Internet is unavailable - why do the 120-sec repetitions also stopp ?
  4. what can I do ?

Thank you vor any support on this.

Can you please share any error messages you get when the device disconnects? What kind of network are you using? Is it stable? How are you powering your device? What sort of device are you using?

Hello @skuenstler could you please answer questions from my colleague?

It could be nice to read some of the logs from your device being online and getting offline.

Thanks

At times of failure, there are absolutely no logs, whatsoever. All that is happening is that the normal log comes in (i.e. the log that comes in every 120 sec anyways). But then it stalls.

Here is the last log that happens right at the moment it fails:

wifi-connect: Your device is already connected to the internet.
wifi-connect: Skipping setting up Wifi-Connect Access Point. Will check again in 120 seconds

The system goes back to Access-Point mode without ever trying to dial into the internet again (with the given SSID/PW from previous captive-portal). It is as if the system looses the credentials.

There are no logs anymore after it has stalled. But Access Point is ON.

I know that the internet-connection plays a role. I had the setup now running for 72 hours without any issues. It was only 4 days ago that the Wifi Connect would stall again and again (and go back to Access-Point Mode and never trying to dial into the given credentials anymore). It happend many times that day. I think the internet was slow or had another issue.

What I do not understand: Even if there is a problem with the internet (and let’s assume that the internet-connection is off for a couple of seconds or even minutes). Shouldn’t Wifi Connect find the internet again once it is back on ? I mean, it has the credentials stored or not ?? Why would it loose SSID & PW ??

What seems to happen if internet is off for a short time that the system goes back to Access Point mode and never tries to dial in again. It is only the user that needs to give SSID/PW credentials again (by connecting to the access-point and giving the credentials in the captive-portal popup manually).

Isn’t there a way that Wifi Connect tries with the given credentials again and again, even after internet is off for a couple of seconds ?

What is the exact logic for Wifi Connect to decide to go to Access Point Mode ? And why does it not try again to connect to the correct credentials once it is in Access Point Mode ?

@skuenstler,

We’ve heard similar reports from other users but have yet to be able to replicate this locally, so really appreciate you writing in and taking the time to work through the issue with us. Could you share what hardware you’re using, as well as take a look through the following?

  1. When you run the Diagnostics in the Diagnostics tab (of a device that has fallen off WiFi), do you see any details there?
  2. Are there any details when querying journalctl -u networkmanager from the hostOS?
  3. In past scenarios, we have sometimes seen that manually resetting the connection with NetworkManager works temporarily, so I’d wonder if trying it in your case works: nmcli c up <CONNECTION>
  4. If it does, then you might find this useful: GitHub - balena-io-playground/keep-wifi-up: Workarounds issues that prevent NetworkManager from reconnecting to a WiFi network It is something we’ve created to help re-instate the connection in instances where that does in fact work

However, we mostly would like to understand why this is happening with NetworkManager, so any details you can provide us from the Diagnostics tab or journalctl would be very appreciated.

If it helps, you should be able to see exactly how WiFi Connect works in our GitHub repo here: GitHub - balena-os/wifi-connect: Easy WiFi setup for Linux devices from your mobile phone or laptop It is an open-source project and meant to be maintained jointly by balena and the community, so if you have any improvements, please do submit them as an Issue / Pull Request in the longer term.

Is this our issue as well?

Initial setup works great. The hotspot works great and captive portal works great sending configuration and establishing a client mode connection while killing the hotpsot.

Test sequence:

Kill the WiFi
Hotspot starts as it should
Bring WiFi back
Device will not reconnect to WiFi. Hotspot stays active
A manual restart will fix the issue an reconnect to Wifi

Interesting: if you try to connect to the hotspot (prior to rebooting) the captive portal will list any available WiFi networks except for the original one it was connected to.

Does this match what others are seeing? Any fix yet?

Hi @brownster,

Can you follow the instructions I shared above for the last user and provide the results to us? I’m not sure looking at your description if it’s the same problem, but having those details will help us to diagnose it here as well.

Thanks!