Balena Device won't come online

I was testing a script that turns off networking and after a timeout turns it back on, to test how some of my containers react on a network outage.

I ran ‘nmcli networking off’ and after a ‘time.sleep()’, ‘nmcli networking on’ respectively.
This worked fine for a short timeout(while device stayed in heartbeat only mode) before turning back on, however, I wanted to let the device go “Offline” before turning it back on, without realising that it won’t in fact turn back on!

So I manually rebooted the device, this didn’t help, I unplugged the device and plugged it back into the mains to do a power cycle, but the device has still been “Offline” since. I am confused as to how to troubleshoot this, is there any way to connect to the device locally, via USB or such, as with it being offline, I can’t connect to it to see what the issue is. Or is the only way to fix this to re-flash balenaOS onto the device?

I am curious to know whether running ‘nmcli networking off’ would normally render the balenaOS useless, or has something gone terribly wrong? I would have thought that after a reboot/power-cycle that the network services would be reset and not stay permanently off?

Hello, it looks like nmcli networking off disables networking completely. So unless you’re sure that the nmcli networking on command actually ran, there’s likely no way to connect to the device over a network to get it back online. Perhaps you can connect via a tty/serial port or keyboard/monitor. What type of device is it and is it in development or production OS mode?
In the future you may want to only turn off one connection at a time with nmcli connection down networkName. See Network Setup on balenaOS | balena
Also, just to clarify, balenaOS will keep running your containers while the device is offline, you just won’t be able to access it from the balenaCloud dashboard.

Hi, yes I think it was a bad idea from the get-go, it seemed to work for a short timeout period, however the idea was that hostOS would still be running even with the device being offline, so the script would have ran ‘nmcli networking on’ after 1260seconds or 21minutes. However, this did not happen, is it normal for networking to not reset after a reboot/power-cycle?

It is a “Raspberry Pi 5” and it is in Development OS mode.

Yes, I noticed that, which was the reasoning for trying to run this script to disable/enable networking, I had placed the script in ‘/mnt/data’ of the hostOS and executed it manually.

I also previously had connected it to a monitor but after the splash screen nothing would show, I did not try using a keyboard/mouse though, should I be able to get to the CLI by simply using keyboard/mouse?

Thank you for the response! :grin:

Hello @balena101 i discussed internally with other team members and find here some comments:

  • Network manager persist on disk the nmcli networking off, so rebooting does not bring the networking back. So we strongly recommend to stop a network name and keep Ethernet enabled (e.g.).
  • You might be able to change the config.txt to enable serial console access. Then you might be able to log in via serial and run nmcli networking on.
  • Alternatively, if you have a Linux machine, you might be able to mount the state partition of the device on your laptop and change the configuration.

Let us know if that works!

1 Like

Hello,

Yes that was a bad decision from the get-go… :sweat_smile:

I myself could not access the device as I am not in office, however someone was having a look for me, in the end we decided to re-flash the SD.

Would be interesting to try and see if it would have been possible to fix without online access to the device so that might be something I will look into just for curiosity, I will post an update about it when I get to try it out, without breaking the device again that is!

Thank you for the help on this @mpous @alanb128 :grin:

1 Like

Thanks! keep us posted with your research!

And let us know as well how do you test the device getting in Offline status and bring it back Online so other members from the community can test or comment. Maybe this is a good Show and Tell post.

I will do :grin:

Just a quick question while we’re here, if the containers continue to run even in ‘Offline’ mode as @alanb128 confirmed, is there a reason why a script placed in ‘/mnt/data’ would stop running after the device switches to ‘Offline’ status, it worked fine for a short delay however, it must have stopped when the delay was 20 minutes and the device went ‘Offline’. :sweat_smile:

It was a small bash script that ran:

nmcli networking off
time.sleep()
nmcli networking on

@balena101 Could you please share more details of your script? or create a dummy script so we can try to reproduce?

Thanks

Of course @mpous , it was very simple, possibly why it didn’t work. :laughing:

The script is below:

#!/bin/bash
echo "Disabling network..."
nmcli networking off  
sleep 1260
echo "Re-enabling network..."
nmcli networking on

I tested with a 20 second sleep cycle, and it worked perfectly, so I decided to give it a shot and set the sleep cycle to 21minutes (1260 seconds) to allow the status to change from ‘Heartbeat only’ to ‘Offline’ before it ran the ‘nmcli networking on’ command.
Without realising that Network Manager will persist even after a reboot I thought this would have been my fallback in case this didn’t go as planned… :sweat_smile:

I placed the script on the hostOS in ‘/mnt/data’ and i executed manually using 'bash <script_name>' over SSH on the balenaDashboard.

1 Like