Robustly changing network settings

Hello!

I am evaluating Balena for use in an IoT project where physical access is very expensive. It seems that the only built-in way to change network settings in BalenaOS is to change a file on the boot partition, and if someone makes a mistake the device is essentially bricked… Not ideal!

I would like to have a way to change network settings with rollback. I am aware that it’s possible to do this from within a container, and thus we could implement the required functionality ourselves. This might be a viable option, but implementing the functionality to the required robustness standard will take time, and if something goes wrong there’s no way to recover.

Another idea would be to build a custom version of BalenaOS which sets the network settings at start-up and checks that they are correct, so that if the new settings don’t work we can rely on Balena to roll back to the old version. Would this be possible?

I’d love to hear from anyone who’s done anything similar, or just has more experience with the platform and knows what’s possible and not. Thanks!

1 Like

Hi there,
BalenaOS uses NetworkManager accompanied by ModemManager, to deliver a stable and reliable connection to the internet, be it via ethernet, WiFi or cellular modem. Additionally, to make headless configuration of the device’s network easy, there is a system-connections folder in the boot partition, which is copied into /etc/NetworkManager/system-connections. So any valid NetworkManager connection file can just be dropped into the boot partition before device commissioning.
I quote: “One of the lesser-known goodies provided by NetworkManager is the checkpoint/restore functionality. It allows the user to roll back to a working network configuration if any changes render a machine inaccessible over a network. The user needs to define a checkpoint first, then conduct the potentially dangerous changes and finally confirm that the changes didn’t disrupt connectivity. A checkpoint is essentially a snapshot of an active network configuration along with a timer. Should the changes cause a networking outage, the timer expires before the user can confirm success and the changes are reverted, hopefully restoring connectivity.”
You can find a relevant Python example here: https://github.com/NetworkManager/NetworkManager/blob/master/examples/python/gi/checkpoint.py

Please let me know if you need any further help while evaluating the platform and implementing it. We can keep our conversation here and moreover one of our team members can reach out to discuss how our platform meets your needs.

Lesser known, indeed! That’s a wonderful feature!

Presumably the typical use case would be to take to NetworkManager from within a container. Will any changes to the NetworkManager configuration be persisted accross reboots?

What about checkpoints? Are they stored on disk? What would happen if I made a checkpoint, applied a faulty configuration and then suffered a power loss before the watchdog could trigger?

Thanks for your help!

Hey! Adding and changing connections should persist, as well as checkpoints, but we haven’t ever tested the checkpoints.
There is one catch, that if you modify the default resin-wifi-01 connection (the one that is found in /resin-boot/system-connections) then those will get reset when you reboot, because the old connection files are copied every boot from/resin-boot/system-connections-->/mnt/boot/system-connections/.
So when changing stuff from the container you should add a connection rather than editing the default one.
Please let me know how it works for you once you implement it.

Hello again,

I have now had a bit of time to play with the checkpoints, and they do seem to cover most of our needs. However, based on my experiments, checkpoint do not survive power loss. Unfortunately, this can leave the device in a permanently offline state.

Do you think this is a problem with NetworkManager itself or with BalenaOS? Is there anything we can do to mitigate the risk?

Hi, what method did you choose to modify the networkmanager settings? did you drop a custom connection file into the boot partition directory or change it from a container?

Hi, I communicated with NetworkManager using DBus from a container.


robertgzr

May 8
Hi, what method did you choose to modify the networkmanager settings? did you drop a custom connection file into the boot partition directory or change it from a container?

in that case you would have to assure to run that operation on every boot since they won’t be persisted. have a look at the documentation here that explains the methodology for using the system-connections directory. https://www.balena.io/docs/reference/OS/network/2.x/#introduction

you would only have to do that once per device and the system then restores the networkmanager settings including your changes there

I don’t quite understand - the network settings are persisted when using NetworkManager, it’s just the checkpoints that are not.

I’m not sure what operation I should have to do every boot, since it is the moment of changing settings that’s critical and can lead to an offline device.

Hi @vegard.lillevoll , From the sounds of it, it might be that the checkpoint files are not written to a persistent part of the OS. So two questions to help us narrow it down:

  1. when you are modifying the connections, are you changing the default connection (i.e the same one you had in the /resin-boot/system-connections folder originally) or are you first using the NM dbus command to create a completely new connection? The reason I ask is because if you are doing the former, it means any changes to the connections will over written by the default file each time your reboot.
  2. After applying a checkpoint to the system, can you log into the HostOS and look for what location the checkpoint files are written to. I would expect them to be written to somewhere in /etc/NetworkManager. Additionally to this, it would be good to tail the OS logs from NetworkManager at the time you create a checkpoint, using journalctl -u NetworkManager -f so we can see if it tries to write a file somewhere and fails due to things being Read Only.

I’ve altered network settings via DBUS and have manually edited config files via pre-installed vi on HostOS(/etc/NetworkManager/system-connections/*.nmconnection) and have had my connections persist through new application pushes + full system reboots.

@dedline yes that is correct, adding files via dbus or vi into /etc/NetworkManager/system-connections will always persist with just one caveat. If you edit/change the default resin-wifi-01 in /etc/NetworkManager/system-connection/ then those changes will not persist per reboot since every boot the resin-wifi-01 file from /resin-boot/system-connection/ is copied over into /etc/NetworkManager/system-connections and wipes out the changes there.

Ahhhhhh I see! I’ve seen that sleepy little file before in /system-connections. Now I know the power he holds :open_mouth:

Yeah, its something I have wanted to change for a little while, but unfortunately it would be a breaking change and would need to be changed when we do balenaOS 3.0, since many people rely on that and have built tooling expecting that file and its behaviours (as confusing as they are sometimes :stuck_out_tongue: )

Hi,

I am modifying an existing connection, but it’s for a wired interface and as far as I can see the default configuration is only for wifi. NM does create a new file for the connection in /etc/NetworkManager/system-connections when I modify it.

Creating a checkpoint does not seem to create any new files. I made a list of all the files in the file system before and after creating a checkpoint, and there was no difference. I also searched all the files in /etc for the word “checkpoint” and nothing turned up.

Could it really be that checkpoints are only stored in memory? That seems like such a huge oversight, but I suppose it might be more acceptable in a server environment with UPSes than for IoT applications.

Hi, I looked into how checkpoints are managed in NetworkManager and could not find anything related to persistently storing them in the source code. It looks like indeed they are stored in the memory only.

An alternative way that comes to my mind would be to to use multiple connection profiles (which are stored persistently) and mark the ones that are not currently used with autoconnect = false.

Ok, that’s a good option to have.

Thanks to everyone who has contributed to this discussion, I think a combination of checkpoints and stored connections would serve most of our needs. I hope NetworkManager checkpoints will be made persistant at some point in the future, but they can still prevent costly misconfiguration mistakes in many cases.

Chiming in as this is a topic of interest…

Isn’t the whole point of the checkpoint to not be persistent? State change would look something like this:

  1. Known Good State
  2. Make a checkpoint and a change
  3. If power is lost, the change disappears and you are back in the Known Good State
  4. If the change works, you confirm and that is written to persistent storage

A.

Yes, that would be ideal. The problem is that’s not what happens at step 3, instead the checkpoint disappears but the change persists, leaving you in the changed state with no way out.

Darn, that is completely counterintuitive.

Is it possible to make an ephemeral change using DBUS?