RPi3 device goes offline until reboot

Hello

We are experiencing WiFi issues with some devices deployed at one of our customers.
When the devices are booted, everything is working fine for a couple of hours to a day or two, but then the device disconnects and won’t reconnect automatically.
Power cycling the device always makes the device reconnect, but only for the aforementioned time.

The situation is similar to what is described in:
this post.
but upgrading to the newest OS hasn’t fixed the problem, and if possible I would rather not manually restart the networking services entirely as proposed.

The same devices work fine, when they are brought to our test-environment with a different WiFi-network, so it seems that it is provoked by the client’s WiFi.
Also, the exact same hardware setup and Balena application is running for almost a year now at other client’s without problems.

Is there any obvious fix for this?
Also can we get some NetworkManager-logs from the devices, that are not deleted after power cycling, that might point us in the right direction?

The devices are encased and are not readily accessible for us, so attaching external screens etc. for debugging is not really an option.

Setup info:
Hardware:: Raspberry Pi 3 Model B
OS-version: balenaOS 2.29.2+rev2

Networkmanager config-file:

[connection]
id=<id>
type=wifi
autoconnect-priority=1
interface-name=wlan0

[wifi]
hidden=true
mode=infrastructure
ssid=<WiFI SSID>

[wifi-security]
auth-alg=open
key-mgmt=wpa-psk
psk=<password>

[ipv4]
method=auto

[ipv6]
addr-gen-mode=stable-privacy
method=auto

Hope you are able to help!

Best regards
Johan

Ho Johan,

Can you try removing the hidden=true line and check again whether you will get better results?

Also if you have another WiFi dongle attached to the device, the interface names could be swapped.

For easier WiFi configuration you may also check https://github.com/balena-io/wifi-connect

If removing the hidden field does not solve the issue for you, nor there is a dongle attached, then probably the easiest way to diagnose this is to plug an Ethernet cable to the device without rebooting it. If not possible you may look for persistentLogging in our documentation, but which will allow you to look into the device logs after the power cycle.

Thanks,
Zahari

Hi Zahari

Thanks for the fast response!
Yes I will try to remove the hidden field first and see, if it fixes the problem.
Ethernet is not an option at the site, so persistentLogging is probably the next step after that.

Also, there is no dongle attached. We are using the on-board WiFi interface.

Can I somehow change the NetworkManager config-files through the dashboard without having to add new config-files to the SD-card directly?
Can I add a new config file, or do I need to update the existing through the DBUS interface?

Best regards
Johan

Hi,
You can enable persistent logging from the device configuration page in the dashboard.
This will store up to 8MB of logs in /var/log/journal, which you can access from the host os once connectivity to the device is restored.

Could you give us some extra details about your setup and the wifi dongle that you are using?
Could you also give us support access so that we also can check the device logs?

Since as you mentioned this sounds like an issue with the client’s wifi setup/hardware, as a midterm semi-automated solution you could have a poll, checking connectivity (eg by doing wget http://api.balena-staging.com/ping) with an interval and restart the networking services, as suggested in the forum thread you linked. If you are using a multicontainer application you could also have a separate container for this purpose, keeping this isolated from the rest of the application.

Kind regards,
Thodoris

Hi Thodoris

Great, persistent logging is now enabled.
However, my question was more about how to change the networkmanager config-files on an already deployed device, where I don’t have access to the SD-card itself?

As I wrote above, I am not using a Wi-Fi dongle, but the built-in Wi-Fi on the RPi 3 model B.
And yes, I have granted support access for the two devices on the site now for the next 6 hours and sent you the device links in a PM.

And yes that might be a solution and we are running a multicontainer app, so it could indeed be an isolated service.
But if possible, I would rather not meddle to much with the network at runtime in order to minimize the risk of a device that disconnects in the field due to improper handling.

Best regards,
Johan

Hi again

Do you maybe have an answer to my questions above? :slight_smile:
Also, when you are ready to check it out, I can grant support access again.

Best regards
Johan

Hi Johan. To answer you question about editing/changing the existing NM config files without having to touch the SD card, yes it can be done, you just need to edit the files in /mnt/boot/system-connections and in /etc/NetworkManager/system-connections/ . The reason you have to edit in both places is that on boot all the connection files are copied from /mnt/boot/system-connections over into `/etc/NetworkManager/system-connections/, so any changes you made in the latter location would be written over at boot. Once you make those changes you can reboot or restart networkManager and it will pick up the changes.

If the device is online now and has persistentLogging enabled, I can have a look

Thanks for the answers and great!
I’ll PM you the device link.

  • Johan

Thanks! I’ve got it, will take a look.

Hi Johan, so having a peak at the logs on that device, I see:

May 09 12:05:25 28f7bea kernel: rpi_firmware_get_throttled: 12 callbacks suppressed
May 09 12:05:25 28f7bea kernel: Voltage normalised (0x00000000)
May 09 12:05:27 28f7bea kernel: rpi_firmware_get_throttled: 12 callbacks suppressed
May 09 12:05:27 28f7bea kernel: Under-voltage detected! (0x00050005)
May 09 12:05:54 28f7bea kernel: Voltage normalised (0x00000000)
May 09 12:05:56 28f7bea kernel: Under-voltage detected! (0x00050005)
May 09 12:06:10 28f7bea kernel: Voltage normalised (0x00000000)
May 09 12:06:12 28f7bea kernel: Under-voltage detected! (0x00050005)

I’m doing some searching around, but undervoltage and the rpi firmware (not sure if this firmware is related to the wifi chip firmware).

When you tested this device in the lab, do you use the same powersupply and the same usb devices connected to it?

Hi again

Yes it’s the exact same setup, that was working without any issues in the office, but is dropping out on the site. On other production sites, the same setup works fine as well.

The under-voltage problem might be present at the other sites as well, but we haven’t seen the WiFi issues on the other sites.
How do I access the logs myself to check that?

Best regards
Johan

Hi Johan, to access the logs, you can just open a webterminal from the Dashboard to the hostOS and then when you are in the shell run:

journalctl -xe

and also to check kernel logs

dmesg

These should give you most of the error logs.