Containerising OpenThread Border Router (OTBR) for OpenThread [WIP]

ajlennon · July 24, 2023, 5:32pm

Hi everybody!

I’ve been working to containerise OpenThread Border Router (OTBR) for a project we are working on here at Dynamic Devices.

A little background.

We are:

interested in OpenThread over 802.15.4
using the Nordic nrf52 series of devices at the moment
we’re also looking into other devices including silicon from NXP but haven’t got those going

We’ve found that the best way to get started is with the Nordic nrf52840-DK as this has a Segger debugger onboard so you can program it using the Nordic tools.

We are also using the nrf52840 Dongle as this is cheaper than the DK board (£10-£12 versus £40-£50) and we need numbers of them to test out mesh performance.

There are a range of SDKs available from Nordic and we believe that the current and correct SDK to use is nRF Connect SDK which plugs into VSCode.

Resources

You can find more details here:

nRF Connection OpenThread Overview

OpenThread Border Router

CLI and RCP

There are some standard OpenThread examples which should (I believe) be supported whichever target silicon you are using.

There is CLI example which is a command line interface shell you can drive via serial comms to exercise the mesh network. You can use this with the DK “out of the box” or you can change some compile flags to enable USB CDC serial and use it on one of the dongles.

The CLI is used by some Nordic tools including the Topology Viewer

NOTE: There is of course a gotcha. The version of the CLI for the Topology monitor is not the same as the version you can build from the Nordic examples. For the Topology monitor you need to use the hex file that is downloaded within the topology monitor installation tree. The files are on a relative path nRF_TTM-linux-x64/hex and you’ll need the right one for the specific silicon. The difference seems to be in the CLI for T.M. you use the commands directly with a > shell whereas the CLI example you built has other Zephyr RTOS commands and you use the OpenThread CLI commands with an “ot” prefix and a uart$ shell

There are also other tools available in the nrfConnect for Desktop application suite including the Programmer application you will need to program the dongle.

There’s also an 802.15.4 sniffer available nRF-Sniffer-for-802.15.4 and similarly you probably want to use the hex file that is downloaded with this as appropriate for your silicon. You can then connect this up to WireShark as documented here.

There is also an RCP which is a Radio Co-Processor example. This implements some other kind of API to allow host uC applications to talk to it to get acess to the Thread stack and the underlying 802.15.4 radio network.

NOTE: You need the RCP example rather than the CLI running on a dongle for the OpenThread Border Router to work. Although there are some notes in the docs about simulating an RCP if you don’t have hardware

State of Play Today

This is all a bit messy as I am figuring things out as I go along but I have a base block which is based on the standard OpenThread docker image and extends it a little for some bits we need

This is used by an application I have created which is here

Couple of things

I am not convinced that all docker-compose.yml settings are propogated from the block through to the application configuration. This needs further work
The OpenThread code starts up an mDNS service which may well conflict with the Balena mDNS service. I don’t believe we need this for 802.15.4 work as I think mDNS is for WiFi only but this also needs further investigation
The OpenThread Border Router requires to talk to the RCP device which enumerates with my Nordic part as /dev/ttyACM0. If you need something else, like /dev/ttyUSB0 then you need to change the relevant application environment variable in the docker-compose.yml
The OpenThread Border Router should be routing across wlan0 in this configuration. Again if this needs to change to e.g. eth0 then you can change this in the environment section of the application docker-compose.yml
The container is privileged for now and this needs restricting with CAPS or some such.

With all of the above I can run up the service and I get the OpenThread Border Router WiFi server running up on port 80 which I can access through the Balena public URL.

I can FORM a mesh network, which means it’s talking correctly to the RCP dongle (which is confirmed in the logging)

I can then run up a separate CLI dongle and JOIN the network I have formed.

I can then PING an IPv4 address e.g. 8.8.8.8 but this isn’t yet working. The packets are dropped. I have seperately tested both a a docker laptop setup and a base install of OTBR on a Raspberry Pi and I know that when it is correctly setup I do get responses to pings.

So I think my IPv6/IPv4 routing is setup incorrectly.

When I add in network stanzas which I found online the Balena service keeps resetting so I am trying to work out what’s going on here

services:
   openthread_border_router:
....
     networks:
       ipv6net:
          ipv6_address: 2001:3984:3989::20

networks:
   ipv6net:
     driver: bridge
     enable_ipv6: true
     ipam:
       driver: default
       config:
       - subnet: 2001:3984:3989::/64
         gateway: 2001:3984:3989::1

Next Steps?

All help very much appreciated !!!

Cheers, Alex

ajlennon · July 24, 2023, 5:39pm

Can you advise or sign-post @mpous ?

ajlennon · July 24, 2023, 6:30pm

Funny stuff going on in the host journalctl log with constant creation and removal of the network

ajlennon · July 29, 2023, 1:01pm

The story continues…

OK so thanks to @majorz help over on this related question I think we’ve established the problem was I can’t setup a network without giving it an IPv4 address or the daemon gets unhappy

ref: Problems using "network" section in docker-compose.yml - #11 by ajlennon

So I am nearly there now I think

I have an OTBR block that runs up

I have an MQTT-SNGateway block that runs up

Now the OTBR runs up, talks to a connected USB dongle running OpenThread RCP firmware (the radio support) and establishes a wpan0 interface for the mesh comms.

The MQTT-SNGateway can be built for a range of sensor network options such as udp, udp6 and so forth. I aim to run the udp6 build which I’ve successfully talked to from nodes on the the mesh network with a vanilla non-containerised RPi installation.

Currently when the mqtt-sngateway container runs up it’s erroring as the udp6 build can’t open a socket for some reason.

What I need to understand is how to setup a custom network between the containers so that the wpan0 interface that’s been established to talk to the mesh can allow the mesh packets to forward to whatever the mqtt-sngateway container has MQTT-SNGateway (udp6) application listening on.

I’ve always been a bit unclear on how the non-host mode networking actually works so this is a good opportunity to find out.

If anybody can advise or sign-post to useful weblinks I would be most grateful!

[Edit: I suppose I could put the MQTT-SN gateway in the same container as the OTBR, and I might have to, but I would rather keep them separate if I can manage it as I can see use cases for both containers that don’t involve the other]

Cheers, Alex

majorz · July 31, 2023, 8:09pm

So let’s say 172.28.0.0/14 is the subnet of the non-host (bridge) mode network of container(s) where 172.28.0.1 is the gateway. We can think of that bridge as a virtual switch to which container(s) AND the host OS are connected. The host OS will be the gateway, so actually if you have something listening on 172.28.0.1 or 0.0.0.0 inside the host OS, you can reach it from the containers through that gateway address.

Now when some other interface has to be reached it will be on some different subnet, so actually Docker sets up something similar to Internet connection sharing (using iptables) so that the containers may reach the internet and those other interfaces. In that sense the bridge in Docker is more like a virtual router rather than a switch. That means that packets sent to addresses that belong to the network of the wpan0 interface should be able to reach external nodes automatically as Docker sets up all the necessary forwarding - this is all assuming that actually in our case balenaEngine (Docker) is not missing some extra ipv6 option we may need.

Now it is interesting why your mqtt-sngateway container cannot open that socket. Does it work when that container is in host OS mode? Does it work with udp4 (if the container has capability to work in ipv4 mode)? Do you have some extra information like error messages from the failure to open the socket?

Thanks,
Zahari

ajlennon · July 31, 2023, 8:29pm

Thanks for your response @majorz

I think the socket opening was a red herring. I’ve got that sorted out now.

I have a colleague working through this at the moment. We’ll prepare a more comprehensive post for you on where we are at the moment. Currently I have MQTT-SNGateway and the OTBR in the same container and am still having problems connecting to IPv6 addresses.

We’ll try to figure out what we’re doing wrong inside the container, and hopefully this’ll inform us as to the direction for separating out between different containers…

majorz · August 2, 2023, 12:15pm

Hi Alex, I got an idea that may turn out to have some substance. Do you have by chance a running device that experiences the problem with ipv6 connectivity on top of which I may try to apply some small changes inside the host OS and see whether this will solve the issue?

ajlennon · August 2, 2023, 2:18pm

Yeah totally

I have been down ALL SORTS of rabbit holes here. I think there are problems with the PAHO upstream MQTT-SNGateway running IPv6 and doing multicast advertisements.

I’ve made some changes and it now seems to work…

I’m pretty sure this relates to the container issues as it if didn’t work in the same container as OTBR I am sure it wasn’t going to work outside.

I’m just wrapping up some things but let me know when you might want to have a look at the board here?

Cheers!

majorz · August 3, 2023, 11:58am

So I have been thinking on how we can approach this. Whenever you could can you please paste here the full UUID of a test device with support access enabled? It will be only our engineers that will be able to access it through the VPN. I will check carefully the ipv6 configuration and see whether balenaEngine itself needs an adjustment.

I tried to think of some alternative approaches, but the one with a live test device with support access shared will be most useful.

ajlennon · August 4, 2023, 4:49pm

Hi @majorz so I have the MQTT-SN gateway working with the OTBR when they are in the same container.

I’ve split the MQTT-SN gateway out now so that it is running in a separate container from the OTBR, and somehow I need to get IPv6 unicast and multicast messages to traverse between the containers.

I’ve given support access and will PM the device ID

majorz · August 5, 2023, 11:37am

From the mqtt container running curl over ipv6 tcp works:

curl -6 http://openthread_border_router:80
curl -6 http://[2001:3984:3989::21]:80

Maybe next you need to add some expose statements - Compose file version 2 reference | Docker Documentation

Something like

    expose:
      - "12345/udp"

You can use also ports in case you want those exposed inside the host OS/the outside world.

Please note that there seems to be a running service “MQTT-SNGateway.” in the openthread_border_router container that probably should not be there?

root@2bc7751c8e18:/app# ss -ua6pn
State   Recv-Q  Send-Q   Local Address:Port    Peer Address:Port Process                                   
...   
UNCONN  0       0                    *:10000              *:*     users:(("MQTT-SNGateway.",pid=244,fd=7)) 
UNCONN  0       0                    *:10000              *:*     users:(("MQTT-SNGateway.",pid=244,fd=6))

For multicast you may need to do some extra experimentation. I asked chatgpt for some hints on that and it said macvlan should be used instead of the bridge driver as the bridge driver does not support multicasting. I am not sure our supervisor supports macvlan networks in docker-compose.yml as that is only available in Docker Compose version 3. So multicast may or may not work at all actually.

ajlennon · August 5, 2023, 11:38am

Hiya!

Ooooh thanks - yes thinks are a bit messy I am afraid as I did a load of work to get things going “in container” and then just quickly took out the MQTT-SNGateway so maybe I didn’t fully take it out.

Let me make those udp changes now and make sure it’s not running in the otbr container

ajlennon · August 5, 2023, 11:43am

Just looking at this now. One question I had maybe you can help with?

I’ve been putting github repos together for the mqtt-sngateway and otbr as above.

Each has a “deploy with balena” button on it. But I seem to only be able to deploy one “block” in this way?

Is there a way to deploy a block onto a fleet to add a container service when there’s already a container service?

This would seem to be the point of blocks - adding multiple ones - but I can’t see how to do it without hand editing the docker-compose and using balena-cli to push up a separate repo combining the blocks?

ajlennon · August 5, 2023, 12:31pm

OK I have the g/w taken out of the otbr and done an expose on 10000/udp on the mqttsn g/w

It’s all a bit messy I am afraid as I have to head off to watch Barbie now (!) but will continue on later

majorz · August 7, 2023, 2:30pm

I do not have personally experience with multiple blocks installed directly through balenaHub. Maybe you can try with Apps there? If you find any difficulties regarding this please open a new forums thread as this one revolves around the networking topics.

But if you encounter more networking related ones, please post them here, so that I could pick them up.

ajlennon · August 7, 2023, 3:36pm

OK so I’ve got things where I need them to be with the MQTT-SN Gatway in the same container as the OTBR. That all works OK.

Here’s where I am at when I have the two different containers (mqtt-sngateway and openthread_border_router)

I have a a client node which connects to the mesh that is running from the OTBR container

That has a mesh tunnel interface wpan0 with these addresses (which change on reboot)

wpan0: flags=4305<UP,POINTOPOINT,RUNNING,NOARP,MULTICAST>  mtu 1280
        inet6 fdde:ad00:beef:0:d1db:c36:12c4:6cc0  prefixlen 64  scopeid 0x0<global>
        inet6 fe80::7820:730a:7c3:eee4  prefixlen 64  scopeid 0x20<link>
        inet6 fdde:ad00:beef::ff:fe00:fc10  prefixlen 64  scopeid 0x0<global>
        inet6 fdde:ad00:beef::ff:fe00:7c01  prefixlen 64  scopeid 0x0<global>
        inet6 fd11:22::2c7f:8d29:45bc:ef9  prefixlen 64  scopeid 0x0<global>
        unspec 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00  txqueuelen 500  (UNSPEC)
        RX packets 2  bytes 102 (102.0 B)
        RX errors 0  dropped 1  overruns 0  frame 0
        TX packets 8  bytes 1588 (1.5 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

I can ping fe80::7820:730a:7c3:eee4 from the client node

Then there’s the eth0 interface in the OTBR container with the extra custom network settings we set up

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.28.0.3  netmask 255.252.0.0  broadcast 172.31.255.255
        inet6 2001:3984:3989::21  prefixlen 64  scopeid 0x0<global>
        inet6 fe80::42:acff:fe1c:3  prefixlen 64  scopeid 0x20<link>
        ether 02:42:ac:1c:00:03  txqueuelen 0  (Ethernet)
        RX packets 19  bytes 3198 (3.1 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 166  bytes 38756 (37.8 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

I can ping 2001:3984:3989::21 from the client node
I can’t ping fe80::42:acff:fe1c:3 from the client node

The MQTT-SN gateway is running in the mqttsn-gateway container

This has the following ethernet interface configuration

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.28.0.2  netmask 255.252.0.0  broadcast 172.31.255.255
        inet6 fe80::42:acff:fe1c:2  prefixlen 64  scopeid 0x20<link>
        inet6 2001:3984:3989::20  prefixlen 64  scopeid 0x0<global>
        ether 02:42:ac:1c:00:02  txqueuelen 0  (Ethernet)
        RX packets 240  bytes 55500 (54.1 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 30  bytes 2533 (2.4 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

I can’t ping either of those IPv6 addresses on the eth0 interface from my client node

So I’m rubbing my head trying to work out what I need to setup. Hummm!

[Edit - I can also ping out to the internet with an IPv4 address for Google which is converted to IPv6]

uart:~$ ot ping 142.250.178.3
Pinging synthesized IPv6 address: fdc0:c2ac:597d:2:0:0:8efa:b203
16 bytes from fdc0:c2ac:597d:2:0:0:8efa:b203: icmp_seq=10 hlim=113 time=34ms
1 packets transmitted, 1 packets received. Packet loss = 0.0%. Round-trip min/avg/max = 34/34.0/34 ms.
Done

I can also ping both of the IPv4 addresses in the two containers!

uart:~$ ot ping 172.28.0.2
Pinging synthesized IPv6 address: fdc0:c2ac:597d:2:0:0:ac1c:2
16 bytes from fdc0:c2ac:597d:2:0:0:ac1c:2: icmp_seq=13 hlim=63 time=14ms
1 packets transmitted, 1 packets received. Packet loss = 0.0%. Round-trip min/avg/max = 14/14.0/14 ms.
Done
uart:~$ ot ping 172.28.0.3
Pinging synthesized IPv6 address: fdc0:c2ac:597d:2:0:0:ac1c:3
16 bytes from fdc0:c2ac:597d:2:0:0:ac1c:3: icmp_seq=14 hlim=64 time=15ms
1 packets transmitted, 1 packets received. Packet loss = 0.0%. Round-trip min/avg/max = 15/15.0/15 ms.
Done

ajlennon · August 7, 2023, 3:50pm

Reading around a bit there are some older posts that seem to indicate Docker might not support inter-container multicast

majorz · August 8, 2023, 9:17am

I think those fe80 local link addresses won’t be available from the outside indeed. It is odd that you can reach 2001:3984:3989::21 on the OTBR container, but not 2001:3984:3989::20 on mqttsn-gateway container. Could it be some firewall rule being applied to it from inside the container?

ajlennon · August 8, 2023, 9:36am

Yes I can usually reach the container fe80 from the mesh. WIll have a look at firewalling…

jberi_golioth · August 26, 2023, 2:50pm

I’m trying to understand the networking issues here to see if it applies to my use case. I’m not planning on using the MQTT-SN server but would very much like to make sure the link/mesh local IPs are accessible to other Thread devices. Is that still part of the problem? I’m new to Balena and not sure if there are multiple layers of networking hoops to hop through.

Topic		Replies	Views
Problems using "network" section in docker-compose.yml Product support	13	702	July 29, 2023
sysctls in docker-compose.yml Product support	9	9954	July 24, 2023
Add nginx reverse proxy Product support	3	331	August 23, 2023
Using NFS Server to share external storage between containers Discussions	7	561	July 12, 2022
Resin.io and Thread for Samsung Artik Series Product support	4	2094	December 30, 2016