Cellular Connections lost after working for several hours on balenaFIN with two network connections

Hi!

For my project, I am using a balenFIN v1.1.0 devkit with the latest balenOS updates. Connected to the miniPCIE port, a Quectel UC20 3G chip and a Twilio SIM Card.

The goal is to make the balenaFIN a cellular gateway to a local wired network. So, on the one side, a cellular connection with internet access, on the other side, a wired connection on a local network without internet access.

Using the configuration in https://www.balena.io/docs/reference/OS/network/2.x/, and setting the eth0 interface with a static IP and ipv4 never default, I would expect the cellular connection to stay online all the time.

However, after staying online for a couple of days sometime, the connection is lost (status in balenaCloud is offline, and I cannot open a terminal session). Rebooting the device sometimes brings it back up, sometimes does not.

I’ve seen a topic where it was said that a container shutting down could also bring down the interface due to interference with udev.
I maybe suspect the wired connection to get a higher routing priority due to the cellular connection failing sometimes, but that should not happen with the never_default option, right?

Did you encouter a similar issue in your projects? My dream would be to have a NetworkManager configuration for a non-stop cellular connection for remote access + a LAN connection via eth0.

Cheers,

Jules.

Hi and welcome to the forums!
So this doesn’t seem to be a problem with never-default, but it could be an unknown issue with the modem itself.
When this happens:

  • Connect to a wifi or ethernet network that has internet connectivity or connect to the device with a serial cable from a laptop
  • Investigate what is the state of the modem (things like checking the state of modem with nmcli and mmcli, looking at the journal and kernel logs, etc.)
  • In addition, you may detect if the modem is stuck from your application and consider rebooting the device like you did before

Thanks for the tips, i’m in the process of reproducing the bug, and this error caught my eye in the journal:

NetworkManager[720]: [1563886186.3811] modem-broadband[cdc-wdm0]: failed to connect modem: QMI protocol error (79): ‘PolicyMismatch’

I couldn’t see what causes this issue, do you know what this could be?

Perhaps this thread is of interest: https://lists.freedesktop.org/archives/libqmi-devel/2017-March/002239.html

It implies that it’s an ipv4 vs ipv6 problem. I would check the network setup for this device, to ensure these configurations match.

Tried setting ipv6 method = disabled as we only use ipv4, will see what happens.

A new disconnect happened with the following dmesg trace log

    [Tue Jul 30 20:45:27 2019] ------------[ cut here ]------------
    [Tue Jul 30 20:45:27 2019] WARNING: CPU: 0 PID: 0 at /yocto/resin-board/build/tmp/work-shared/fincm3/kernel-source/net/sched/sch_generic.c:320 dev_watchdog+0x298/0x29c
    [Tue Jul 30 20:45:27 2019] NETDEV WATCHDOG: wwan0 (qmi_wwan): transmit queue 0 timed out
    ...
    [Tue Jul 30 20:45:27 2019] ---[ end trace 92fc5b58ea3145cd ]---

This looks like a driver issue to me,any idea how to prevent or solve this?

Hi,

This seems to be quite a frequent issue but I couldn’t find any mentioned fix https://dev.archive.openwrt.org/ticket/13738.html

Please see our list of tested modems: https://www.balena.io/docs/reference/OS/network/2.x/#known-tested-modems

We’ve ordered Quectel EC25’s, hopefully they do not have the same issue.
As a hotfix, we created a cronjob verifying the connection status, and resetting the modem with ‘mmcli -v -r’.
The connection then sets up correctly.

Thank you for the support forum!

Thanks for the heads up!