Help with debugging GSM needed

Good morning,

I have a CM3 based device with an LTE modem (Huawei 909). It goes offline frequently (about 1-2 days after boot) and does not recover (blinking 4 times in a row). I would like to debug and set persistent logging already.

It looks like even if my device uptime is at about 10 hours, the log only covers the last hour:

root@9d52738:~# journalctl -u NetworkManager
-- Logs begin at Fri 2019-11-29 07:52:34 UTC, end at Fri 2019-11-29 08:57:54 UTC. --
-- No entries --

Unfortunately this way I have no chance to see what was happening when the device went offline.

Thanks for any help.
Bruno

hi @bvetter,

Which balenaOS version are you using please?
Can you use dmesg to see if you have any logs related to your modem just to rule out any hardware issues?
The logs are limited to 8MB. If your containers are writing too much, it might use it up quickly.
You could also try to retrieve the NetworkManager logs using a console cable and monitor any failures.

Hope this helps
Regards
Rahul

Hi @rahul-thakoor,

I am using balenaOS 2.43.0+rev1. I will wait until the next failure to check dmesg. I do not have a console cable available, but try to get one.

Still I am asking myself why the logs are so short. When I capture all logs right now like this
journalctl > /tmp/journal.log
it just adds up to a file of 360kb.
My device is up since 14 hours, but journalctl only shows 1 hour of logs for NetworkManager. See this:

root@9d52738:~# date
Fri Nov 29 12:56:59 UTC 2019
root@9d52738:~# uptime
 12:57:17  up  14:35,  1 user,  load average: 0.39, 0.26, 0.21
root@9d52738:~# journalctl -u NetworkManager
-- Logs begin at Fri 2019-11-29 11:48:24 UTC, end at Fri 2019-11-29 12:57:25 UTC. --
-- No entries --

Any idea why I cannot see NetworkManager logs older than 1 hour?

Hi Bruno,

It is normal for NetworkManager to not show any logs for extended period of time. That means that it was not triggered by a system event for that period.

What you describe quite possibly is a firmware or kernel/driver issue with the modem. It is not uncommon for modems to get in a bad state and do not auto-recover. The NetworkManager will possibly not show anything. There is a slightly better chance with ModemManager logs. Also by default they do not run in debug mode, so maybe you will need to elevate the log level for diagnostic purposes.

ModemManager may reset a modem, but it is not done automatically. The command sequence for doing so is usually:

nmcli c down <CONNECTION_ID>
mmcli -m 0 --disable
mmcli -m 0 --set-power-state-off
mmcli -m 0 --reset

The modem could be at a different slot than 0 though. The exact number could be extracted with mmcli or the ModemManager API.

What I would suggest is that you prepare a test setup for diagnostics. The CM3 device connected both to the modem and

Sorry I pressed the send button too early. Will send a second continuation message now.
Thanks,
Zahari

Continuing from the last message:

What I would suggest is that you prepare a test setup for diagnostics. The CM3 device connected both to the modem and Ethernet/WiFi without your usual application, but something tiny as application code. You may have the device with a primary Ethernet/WiFi connection and do a periodic HTTPS request through the modem interface. Once the requests start to fail, you may connect to the device and start looking into it. I would suggest looking at the AT commands guide for the particular modem. You may use mmcli to send such commands to the modem. If cannot determine what is happening, you may always ping us so that we can help you out with diagnosing this.

An easier way probably is just to reset the modem with the above commands in case connectivity is lost. This is the common approach.

You may also explore other modem options. Huawei are not the most stable ones and possibly could contain backdoors.

Thanks,
Zahari

I will try your suggestion. Do you recommend a special tool for monitoring and regaining connectivity (resetting modem) in production mode? I used to use monit but perhaps you have a recommendation for balenas Docker based environment.

Thank you for your help.

I do not have really a recommendation. I usually write a simple loop in some language like Python or Rust. It does not really to be anything special. Like a loop with a sleep of 1 minute - if three consecutive loops without connectivity are encountered, it may take certain action to alert me - e.g. send me an email which I can see on my phone immediately. For doing the request you may use libcurl language binding (since curl has the option for specifying particular interface), or some other easy to use library like Python requests (I think bind to IP address is what it supports, not interface).
Good preparation is important as diagnosing those issues is usually quite time consuming and difficult.
Thanks,
Zahari