Reconfiguration of wifi adapter settings for static IP not sticking

Hi,
I’m working with a modified version of wifi-connect running on Intel NUC hardware that my customer wants to be modified to allow one of the networking adapters to be configured for wifi with static IP address settings. This is a less common configuration for wifi I know but they need the feature.

As of now I am successfully reconfiguring other network adapters on the unit with the help of nmcli via scripts written out at runtime. I’m able to set static IP addressing for both ethernet controllers on the unit, but when it comes time to setup static addressing in the container that settings don’t appear to apply. Are there configuration settings beyond the use of nmcli that may be the cause of wifi settings not sticking?

For an example of what commands I’m issuing to change the adapter state over to static:
#delete the connection
nmcli con del wlp0s21f0u4

#create a new connection
nmcli con add type wifi con-name wlp0s21f0u4 ifname wlp0s21f0u4 ssid ‘MySSID’ ip4 192.168.1.123/24 gw4 192.168.1.1
nmcli con mod wlp0s21f0u4 wifi-sec.key-mgmt wpa-psk
nmcli con mod wlp0s21f0u4 wifi-sec.psk ‘MyPassword’
nmcli con mod wlp0s21f0u4 ipv4.route-metric 1

Any help in this area is appreciated.

-Brant

Hi Brant,

If possible, would you mind detailing a bit more about the modifications you have made to wifi-connect? I would also recommend you read over the static networking docs if you have not already.

Thank you!

Hi again,
The changes to wifi-connect are based on the 4.06 release and were largely done early in 2018 when that release was new. They are primarily to change the layout of the webpages, allow for storage parameters that my client needs for their application and to allow for the configuration of the network adapters.

The network configuration of this device is a bit complex. We have 4 total network adapters, 2 wifi and 2 ethernet. Depending on how the user configures the connection either one of the wifi or one of the ethernet adapters will be responsible for the balena network connection to the cloud.

The basic use case is:

  • Ethernet adapter A - is always configured static with the same settings
  • Ethernet adapter B - is a possible connection to the cloud for the container (if wifi is not chosen)
  • Wifi adapter A - is always running the wifi hot spot as part of wifi-connect
  • Wifi adpater B - is a possible connection to the cloud for the container (if ethernet is not chosen)

So largely we have two adapters we don’t have to care about right now and the user is given the option to run their internet connection through either wifi or ethernet but not both. We use the “ipv4.never-default true” option on the adapter that is not being used between wifi or ethernet to make sure that network manager doesn’t try to use that connection. For example if Wifi adapter B is chosen as the internet path ipv4.never-default is set on Ethernet adapter B so it shouldn’t be used after the unit is rebooted.

Since it has served me fairly well in the past I’m using this guide as a means of using nmcli to configure the wifi static settings on the wifi device as needed:

https://docs.fedoraproject.org/en-US/Fedora/20/html/Networking_Guide/sec-Connecting_to_a_Network_Using_nmcli.html

The core of the problem I face right now is having the wifi static settings actually stick after reboot. It appears that the wifi connection is alive but none of the settings remain. There does appear to be some fighting between the ‘resin-wifi-01’ connection and the connection that I am attempting to create with the settings I chose. After rebooting the device I would possibly expect that the /etc/wpa_supplicant.conf would be updated with the WPA settings I configured on the command line with the nmcli commands but I see no trace of the configuration I set. This is probably a safety feature by design and part of balena’s configuration in how it accepts configurations from /system-connections.

What I’m looking to do is essentially remap/remove the balena upstream connection safely and put in new configuration parameters for that connection live and reboot the device with the new network configuration. I realize this is considered dangerous based on the networking guide but we are able to do it reliably for static ethernet connections it appears, so I’m looking for the same functionality over wifi.

Thanks for the help,
Brant

How are you running the commands on the host OS, and how are you running them inside the container? These details would help with a reproduction on our side.

One possible solution that comes to mind is creating a systemd service which waits, via after=NetworkManager-wait-online.service, before running the nmcli commands to disable the default config before proceeding to set up your custom config.

Hi dt-rush,
To run the commands on the host OS what we do is generate bash scripts at runtime using the users parameters (stored in /data) and before we reboot the device we run the script. So they are running from the shell using the Command::new() portion of std::process::Command from Rust.

For some additional information about how the dockerfile looks here are the packages we are pulling in as well as the NetworkManager.service mask:

—Snip—
FROM balenalib/amd64-ubuntu

ENV INITSYSTEM on

RUN apt-get update

RUN apt-get install -y dnsmasq wireless-tools vim file udhcpd network-manager
&& systemctl mask NetworkManager.service
&& apt-get clean
&& rm -rf /var/lib/apt/lists/*
—Snip—

The rest is just some application specific stuff to put files in the container and run it.

Ill look into the systemd service idea today. One issue I am assuming I’d need to solve is bringing up the new wifi interface before I reboot, I’m not sure if NM will allow me to do that if the balena tunnel is running on another device however. If you have any other suggestions or details around that approach I’m listening.

Thanks,
Brant

I do not think there is an easy way currently to use WiFi Connect in conjunction with setting up a static IP. The problem is that the connection will not be activated successfully if there is not a running DHCP server and thus a WiFi Connect will fail to create the connection profile successfully (so that it can be modified to include the static IP settings).

We have ongoing work to support all IPv4 and IPv6 settings NetworkManager supports in WiFi Connect, but this is going slowly due to other priorities.

The only current way I could think of is forking WiFi Connect and making corresponding changes in its network-manger Rust library to incorporate static IP settings. Those can be passed through environment variables to make the work easier.

Hi Zahari,
Thanks for your input here. You were the one that helped us out last time I believe when we were trying to get dynamic network configuration going with static ethernet (early last year).

To be clear the wifi connection we are trying to make static is not the one being used for wifi-connect’s hotspot. It is the connection balena uses to run its tunnel to the cloud.

I was hoping we would be able to either delete and create a new NM connection for the wifi device or possibly modify the one as it is in place (resin-wifi-01) and have the settings stick on reboot. We already have a fork of wifi-connect to make changes so that wouldn’t be an issue, if forking the network-manager crate is required to have it push settings to the system that is certainly something I can try.

If it would be helpful to you to see the changes we’ve made I can make those available to you as well.

Thanks again,
Brant

Hi Brant,

Ah, alright, in that case I think I have an idea what happens. The resin-wifi-01 connection is on the boot partition and is copied over on each reboot. This is why if you modify it then on reboot it will be copied over with the original settings. You may find more information on this here after the example code: https://www.balena.io/docs/reference/OS/network/2.x/#changing-the-network-at-runtime

Thus I suggest that you do not include such resin-wifi-01 connection on the boot partition, but create one dynamically instead. Will that solve the issue you face, or it is a different one?

Thanks,
Zahari

Hi Zahari,
Thats a really nice idea. I think that could be a way forward here, we are already doing dynamic network connection creation via scripts and cmdline shell outs.

One question regarding tunnel creation. If we install this unit at a client site, and say they choose the option to want to use wifi to connect to the cloud. Will the container be able to boot without that connection? It may be a catch 22 here but I like the idea. We need the user to configure the outbound wifi connection credentials then we can set it up, if its not setup for the first time there won’t be a tunnel to the cloud to pull down the container.

I may have to check with my client to see if defaulting to an ethernet only configuration the first time is ok and if the customer wants to use a wifi connection they must use ethernet first and use wifi-connect to setup that wifi tunnel to the cloud for permanent use.

Thank you for the idea,
Brant

Right, in that case they will need Ethernet connection to download the container first. I know other users of ours are first loading the containers in their office and then send the device to the customer - ready to run. Or if the customer flashes an image himself, then you may have an image preloaded with the application beforehand - https://www.balena.io/docs/reference/cli/#preload-image-. Hope that helps.

The preloaded image direction makes alot of sense to me and I think could be the way forward. I gave it a shot on my mac and a linux pc I have around and I’m running into either a bug or a configuration issue with my docker setup.

I can certainly open a separate issue for this (and did on GH). Here is the issue I posted over there:

Hi @BrantR, and thank you for the bug report!

Just to clarify, you mentioned having issues on both macOS and Linux. Would you mind providing the Linux information in that issue as well, such that we have that information?

Thank you!

Sure xginn8 I can share the linux issue as well:

Building Docker preloader image. [========================] 100%



| Creating preloader container
\ Starting preloader container
| Reading image information
1: Step 1/7 : FROM docker:17.10.0-ce-dind
 ---> 9769e0f3f9cb
Step 2/7 : RUN apk update && apk add --no-cache python3 parted btrfs-progs docker util-linux sfdisk file coreutils sgdisk
 ---> Using cache
 ---> 8bdeb165a31e
Step 3/7 : COPY ./requirements.txt /tmp/
 ---> Using cache
 ---> c51ccb3e3a0e
Step 4/7 : RUN pip3 install -r /tmp/requirements.txt
 ---> Using cache
 ---> 83cd89631cc9
Step 5/7 : COPY ./src /usr/src/app
 ---> Using cache
 ---> 80ceb4cc385f
Step 6/7 : WORKDIR /usr/src/app
 ---> Using cache
 ---> 61b852637d5d
Step 7/7 : CMD ["python3", "/usr/src/app/preload.py"]
 ---> Using cache
 ---> 41ea4442f3ad
Successfully built 41ea4442f3ad
Successfully tagged balena/balena-preload:latest
Waiting for Docker to start...
Exception in thread background thread for pid 210:
Traceback (most recent call last):
  File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.6/site-packages/sh.py", line 1540, in wrap
    fn(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/sh.py", line 2459, in background_thread
    handle_exit_code(exit_code)
  File "/usr/lib/python3.6/site-packages/sh.py", line 2157, in fn
    return self.command.handle_command_exit_code(exit_code)
  File "/usr/lib/python3.6/site-packages/sh.py", line 815, in handle_command_exit_code
    raise exc
sh.ErrorReturnCode_1: 

  RAN: /usr/local/bin/dockerd --storage-driver=aufs --data-root=/tmp/tmp8xu16hgs/docker --host=tcp://0.0.0.0:32979

  STDOUT:


  STDERR:
time="2019-01-07T15:02:00.777998113Z" level=warning msg="[!] DON'T BIND ON ANY IP ADDRESS WITHOUT setting --tlsverify IF YOU DON'T KNOW WHAT YOU'RE DOING [!]"
time="2019-01-07T15:02:00.778426725Z" level=info msg="libcontainerd: new containerd process, pid: 232"
Error starting daemon: error initializing graphdriver: driver not supported


Traceback (most recent call last):
  File "/usr/src/app/preload.py", line 825, in <module>
    result = method(**data.get("parameters", {}))
  File "/usr/src/app/preload.py", line 785, in get_image_info
    images, supervisor_version = get_images_and_supervisor_version()
  File "/usr/src/app/preload.py", line 668, in get_images_and_supervisor_version
    return _get_images_and_supervisor_version(inner_image_path)
  File "/usr/src/app/preload.py", line 644, in _get_images_and_supervisor_version
    with docker_context_manager(driver, mountpoint):
  File "/usr/lib/python3.6/contextlib.py", line 81, in __enter__
    return next(self.gen)
  File "/usr/src/app/preload.py", line 511, in docker_context_manager
    running_dockerd = start_docker_daemon(storage_driver, docker_dir)
  File "/usr/src/app/preload.py", line 480, in start_docker_daemon
    running_dockerd.wait()
  File "/usr/lib/python3.6/site-packages/sh.py", line 792, in wait
    self.handle_command_exit_code(exit_code)
  File "/usr/lib/python3.6/site-packages/sh.py", line 815, in handle_command_exit_code
    raise exc
sh.ErrorReturnCode_1: 

  RAN: /usr/local/bin/dockerd --storage-driver=aufs --data-root=/tmp/tmp8xu16hgs/docker --host=tcp://0.0.0.0:32979

  STDOUT:


  STDERR:
time="2019-01-07T15:02:00.777998113Z" level=warning msg="[!] DON'T BIND ON ANY IP ADDRESS WITHOUT setting --tlsverify IF YOU DON'T KNOW WHAT YOU'RE DOING [!]"
time="2019-01-07T15:02:00.778426725Z" level=info msg="libcontainerd: new containerd process, pid: 232"
Error starting daemon: error initializing graphdriver: driver not supported

If you need help, don't hesitate in contacting us at:

  GitHub: https://github.com/balena-io/balena-cli/issues/new
  Forums: https://forums.balena.io

It would seem the aufs docker driver is not installed for my version of docker. I’m looking into it now. My docker version is

Docker version 18.09.0, build 4d60db4

And I’m running Ubuntu 18.04 x64 here. docker info shows this:

Containers: 106
 Running: 0
 Paused: 0
 Stopped: 106
Images: 726
Server Version: 18.09.0
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: c4446665cb9c30056f4998ed953e6d4ff22c7c39
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: fec3683
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.15.18-041518-generic
Operating System: Ubuntu 18.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 12
Total Memory: 31.28GiB
Name: Unicorn
ID: WI2Q:7EOB:AT26:237A:TZBB:PJQA:3LYO:FG6T:AANQ:A6ZE:4OFD:VSO4
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine

Any help you can provide would be appreciated.

Thanks.

It would seem that aufs is actually deprecated in docker on series 4.x linux kernels (ubuntu 18.04 being one of them).

I pulled balena-cli from Github and am going through the balena-cli/node_modules/balena-preload/src/preload.py script and the balena-cli/node_modules/balena-preload/Dockerfile which appears to be the correct Dockerfile in this case.

It appears there is code in the preload.py to use overlay2 for the filesystem image. Do you have some better instructions on how to turn balena-cli from github into a functional ‘balena’ commandline build like the prebuilts that are up there? If so I may be able to modify the python script to try and use the overlay2 filesystem driver as opposed to the aufs (deprecated) driver?

Thanks

Continuing down this path I pulled balena-cli from Github and built the docker container in balena-cli/node_modules/balena-preload via:

docker build -t mine .

Before I did this i copied my balena.img (intel nuc build, 2.29.0 rev1) into the container with a simple:

COPY ./balena.img /img/

inside the Dockerfile in the path mentioned above. This appears to get my img into the container in a way, maybe not the best but i can reproduce the losetup -f issue now by jumping into the mine container with:

docker run -it mine /bin/sh
python3 ./preload.py

The problem appears to be with losetup having a complete lack of loopback devices available, ls /dev/loop* returns no devices at all. If I had to guess there is something up with the docker:17.10.0-ce-dind image or perhaps there is a package missing the in RUN list that would help provide loopback devices inside the container.

It would seem i need to run with docker run --privileged -it mine /bin/sh in order to get loopback devices…

Yes you are right, privileged is the way to go. Found a thread discussing other options…

Yes you are right, privileged is the way to go. Found a thread discussing other options…


Hi samothx,
After I was able to run the container with --privileged the preload.py script did run. The problem is I’m not quite sure yet what arguments need to be passed in (presumably from the balena-cli javascript code) that would provide the preload.py with the name of my application so that it can get embedded in the image file.

Do you know the means to making this work so that I may embed my application into an IMG?

Thanks,
Brant

@BrantR, if this helps, when you were looking at balena-cli/node_modules/balena-preload, that’s actually a separate github repo and npm package that balena-cli imports:

Standalone usage of balena-preload as a npm module is deprecated in favour of balena-cli, but given the route you’ve gone down, you could actually try using it. Or at least, it should make debugging easier.

I am also having a look at the issue of aufs vs overlay2.

Regards,
Paulo

1 Like