Container network issue with 2.89.15 and cp-zookeeper/Redhat ubi8 containers

When testing the new 2.89.15 release of Balena I came across a container networking issue: I can’t ping certain containers by their name, although DNS resolution and pinging by IP work fine.

Setup
We’re using a multi-container setup in our fleet of UpBoards running balena. Some of these containers run a kafka stack, e.g. the cp-zookeeper image.
I noticed that the confluent containers couldn’t connect to each other, which I could reproduce by not being able to ping the confluent containers by their name. Apparently this works fine on the old 2.68 Balena version.

Test setup
I simplified the issue but looking up the base image used (Redhat ubi8) and was able to generate a simple docker-compose.yml to reproduce this issue:

version: "2.1"

services:
  test1:
    image: balenalib/intel-nuc-ubuntu:bionic
    entrypoint: ["tail", "-f", "/dev/null"]
    container_name: test1
  test2:
    image: balenalib/intel-nuc-ubuntu:bionic
    entrypoint: ["tail", "-f", "/dev/null"]
    container_name: test2
  zookeeper:
    image: registry.access.redhat.com/ubi8/ubi-minimal
    entrypoint: ["tail", "-f", "/dev/null"]
    container_name: zookeeper

I can ping the containers test1 and test2 from each other but I cannot ping zookeeper by hostname. The DNS seems to be resolved correctly and ping by IP works, however ping by hostname doesn’t (only get report of one package after stopping it with ctrl+c).

Container test1:

root@37df5e0c5fe5:/# nslookup zookeeper
Server: 127.0.0.11
Address: 127.0.0.11#53

Non-authoritative answer:
Name: zookeeper
Address: 172.17.0.2

root@37df5e0c5fe5:/# ping zookeeper
PING zookeeper (172.17.0.2) 56(84) bytes of data.
^C64 bytes from 172.17.0.2: icmp_seq=1 ttl=64 time=0.229 ms

— zookeeper ping statistics —
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.229/0.229/0.229/0.000 ms

root@37df5e0c5fe5:/# ping 172.17.0.2
PING 172.17.0.2 (172.17.0.2) 56(84) bytes of data.
64 bytes from 172.17.0.2: icmp_seq=1 ttl=64 time=0.259 ms
64 bytes from 172.17.0.2: icmp_seq=2 ttl=64 time=0.196 ms

Notice the “^C” before the printout in the second command. I’ve never seen this behavior of the ping command before.

I’m looking for advice on how to pinpoint the issue. If useful I can upload further information (balena inspect of the container or so).

Hi @hesch,

Thanks for all the info. It would definitely be helpful to include some info from inspecting the containers, as well as inspect output of the Docker networks on the device. By default, if the network is not specified, the Supervisor on the device will add all the containers to a managed bridge network, thus avoiding the Docker issue of containers not being pingable with a default bridge network. With managed bridge network, the containers should be able to ping each other if they’re on the same subnet and don’t have conflicting IPs - you’ll be able to see this when you inspect the containers. Normally with Docker networks, there shouldn’t be conflicting IPs, but we’ve seen instances of IP conflicts possibly caused by unclean engine cleanup. For example: Port already in use, because proxy keeps binding to the wrong container IP · Issue #272 · balena-os/balena-engine · GitHub, although I don’t know if this is related.

Thanks,
Christina

Hello,
just checking to see if you have managed to sort this out or you’d like some more help.
Ramiro

Hi and thank you for your replies. Unfortunately the topic got delayed quite a bit by more urgent things coming up. I’ll reconnect the device and provide more info from the inspect.

@cywang117 @ramirogm
First of all sorry for the big delay in providing information.

I set up a test device with the simple docker-compose.yml for testing posted above. Below you find the output of the inspect of the containers and the network. We were able to reproduce the issue on a different device with the Intel-NUC image as well.

# balena network ls
NETWORK ID          NAME                DRIVER              SCOPE
dfbc2d020fa0        1909342_default     bridge              local
c4c32ed6b321        bridge              bridge              local
e77ddd72d7c1        host                host                local
6d692447e105        none                null                local
3d1ae74b18fc        supervisor0         bridge              local
# balena network inspect dfbc2d020fa0
[
    {
        "Name": "1909342_default",
        "Id": "dfbc2d020fa06fcc131ad09b9a9814aaca789aa5d2cc78a0a96f4b4e6ae496e6",
        "Created": "2022-09-01T14:11:39.449383485Z",
        "Scope": "local",
        "Driver": "bridge",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": {},
            "Config": [
                {
                    "Subnet": "172.17.0.0/16",
                    "Gateway": "172.17.0.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {
            "68bc0c9022b206aae7dd6a8c967f974fcf958a55349504b41fd31ce52d446960": {
                "Name": "test1_4595790_2080410_6e749a05e5e48b387d3d3bd89fced5ce3d7075d4",
                "EndpointID": "2de96235d50e8ef205cc2a5d2bfdd34014ed515b20a1761b57fb3b8fe0d812d2",
                "MacAddress": "02:42:ac:11:00:04",
                "IPv4Address": "172.17.0.4/16",
                "IPv6Address": ""
            },
            "7f0c5e01da18e5b339949306ea1d6bc48e99f5b44aa6ef7530b702889eaf8139": {
                "Name": "test2_4595791_2080410_6e749a05e5e48b387d3d3bd89fced5ce3d7075d4",
                "EndpointID": "4d8597df17bdee90283a093d50b90c36c387f38c53c656f950ebae3e3a600bb6",
                "MacAddress": "02:42:ac:11:00:03",
                "IPv4Address": "172.17.0.3/16",
                "IPv6Address": ""
            },
            "86466190381c690ce5dde54a9f380ed92fa8c5c1feeef1fc513000354fede2aa": {
                "Name": "zookeeper_4595792_2080410_6e749a05e5e48b387d3d3bd89fced5ce3d7075d4",
                "EndpointID": "52187b9e0feca687b34b9d0f01842c623ddb53cd2aed24acefbcf4af815f2a45",
                "MacAddress": "02:42:ac:11:00:02",
                "IPv4Address": "172.17.0.2/16",
                "IPv6Address": ""
            }
        },
        "Options": {},
        "Labels": {
            "io.balena.supervised": "true"
        }
    }
]

I attached the full results of inspect on the containers here:
test1_inspect.txt (10.6 KB)
test2_inspect.txt (10.7 KB)
zookeeper_inspect.txt (11.5 KB)

If desired I can grant support access.

Any ideas on how to fix this or to find out whats happening? @ramirogm @cywang117

Hi @hesch
I’ve tested your configuration on a local machine by deploying the sample docker-compose you provided, and I’m unable to reproduce it. I mean, I can ping the zookeeper container from test1 and test2. Here are some logs of my tests.

I’ll check the logs you sent, but could you also grant us access to the device so that we can take a look?

Thanks

Yes I’ll send you a PM with the device. Thank you for the support!

Hi @hesch

I got access to the device and I’m running some tests.

About the ping command that you mentioned, here are my findings. As you found, from the test1 container you can successfully ping the zookeeper container by its IP:

root@68bc0c9022b2:/# ping 172.17.0.2
PING 172.17.0.2 (172.17.0.2) 56(84) bytes of data.
64 bytes from 172.17.0.2: icmp_seq=1 ttl=64 time=0.201 ms
64 bytes from 172.17.0.2: icmp_seq=2 ttl=64 time=0.184 ms
^C
--- 172.17.0.2 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1013ms
rtt min/avg/max/mdev = 0.184/0.192/0.201/0.016 ms

If you ping by name, the ping command doesn’t print any output until you hit CTRL-C; that’s the “^C” you see in the output:

root@68bc0c9022b2:/# ping zookeeper
PING zookeeper (172.17.0.2) 56(84) bytes of data.
^C64 bytes from 172.17.0.2: icmp_seq=1 ttl=64 time=0.179 ms

--- zookeeper ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.179/0.179/0.179/0.000 ms

However if you use ping with the -n flag, which tells ping not to perform name lookups ( _Numeric output only. No attempt will be made to lookup symbolic names for host addresses. _ ) then you see that the output comes immediately as each ICMP response is received, as usual

root@68bc0c9022b2:/# ping -n zookeeper
PING zookeeper (172.17.0.2) 56(84) bytes of data.
64 bytes from 172.17.0.2: icmp_seq=1 ttl=64 time=0.172 ms
64 bytes from 172.17.0.2: icmp_seq=2 ttl=64 time=0.197 ms
64 bytes from 172.17.0.2: icmp_seq=3 ttl=64 time=0.195 ms
64 bytes from 172.17.0.2: icmp_seq=4 ttl=64 time=0.198 ms
64 bytes from 172.17.0.2: icmp_seq=5 ttl=64 time=0.198 ms
64 bytes from 172.17.0.2: icmp_seq=6 ttl=64 time=0.196 ms
^C
--- zookeeper ping statistics ---
6 packets transmitted, 6 received, 0% packet loss, time 5153ms
rtt min/avg/max/mdev = 0.172/0.192/0.198/0.018 ms

So it looks like the inverse name resolution ( IP → name ) is not working. I’ll try to find out why.

Ramiro

1 Like

Hi,

To add on to what my colleague sent, my colleague also tested this on a newer OS version, ESR 2022.4.2, where the issue didn’t occur for them. I see that the up-board device type has OS 2.101 available, are you able to try an OS upgrade to this version to see if the issue still occurs on your end?

Also, the inspects you sent look normal, and thank you for putting them & the reproduction together. As none of the inspects have been weird thus far, it looks like the issue isn’t specific to the Engine.

Thanks,
Christina

@cywang117
I updated the board to 2.101.11 and the supervisor to 14.4.5 to test on the latest version. Unfortunately the issue still persists.
@ramirogm I can confirm that ping -n works. Let me know if you come across any promising leads in finding out why.

Thank you both very much for the detailed replies!

@ramirogm I just did some more digging and found this thread. I checked the logs on the host OS and found:

# journalctl --no-pager
e874ef2 balenad[1236]: time="2022-12-08T16:42:38.781516230Z" level=error msg="[resolver] error writing resolver resp, dns: bad rdata"

To further verify that the container name length is involved I deployed a third test container with an extra long name. This container is identical to the other two test containers except for the name and I can reproduce the issue.
Looking into the container names, it seems that the naming schema has been changed in Balena in some update between 2.68 and 2.89. See the two screenshots below.


Balena OS 2.101.11, long names.

Balena OS 2.68.1, short names.

I’m not sure if that’s the root cause or a symptom. I’m a bit surprised that this apparently doesn’t cause any issues in other setups, since network communication between containers is a very common thing. Can you confirm that and do you maybe have some insights into the changes made and a possible solution.

Hi @hesch, thanks for your detailed response. I did look into the thread you linked, which was very useful, and helped me track down the issue. This is what I found.

A container name longer than 63 chars is causing the issue. From this docker guide Container name configured using --name is used to discover a container within an user-defined docker network. The embedded DNS server maintains the mapping between the container name and its IP address (on the network the container is connected to).

I tested the same docker-compose.yaml app on an x86 device I have, with the same version of the engine ( balenaEngine version 20.10.17, build 13db38c82bdb056f013f5497b0662ad34ffb98f7) , but I couldn’t reproduce the issue. Checking the host names, in your board is zookeeper_5826140_2403300_72b853b44d0246fd789c89269db742d04a2e5ef9 ( 66 chars ) while on mine is zookeeper_5780697_2390976_de73c0877c4c37a0df4e90d603e00012.ae8c6ddc272547a49531149bd2dd187f_default ( hostname 58 chars, domainname 40 chars ). This explains why ping works on my device and doesn’t work on yours. I also think that the kafka containers perform reverse DNS resolution, and that’s why they can’t connect.

As an extra check and a possible workaround, I renamed the container, and after that, the pings succeeded!:

# on the host terminal:
root@e874ef2:~# balena rename 1a24b71d0f01 zookeeper
root@e874ef2:~# balena ps
CONTAINER ID   IMAGE                                                            COMMAND                  CREATED       STATUS                 PORTS     NAMES
d9fdfd59b3e8   3acc3b9c1c98                                                     "tail -f /dev/null"      6 hours ago   Up 6 hours                       test1_5826137_2403300_72b853b44d0246fd789c89269db742d04a2e5ef9
1a24b71d0f01   d4b87fc02437                                                     "tail -f /dev/null /…"   6 hours ago   Up 6 hours                       zookeeper
fe5da180b638   3acc3b9c1c98                                                     "tail -f /dev/null"      6 hours ago   Up 6 hours                       test2_5826138_2403300_72b853b44d0246fd789c89269db742d04a2e5ef9
fda43e6548d2   3acc3b9c1c98                                                     "tail -f /dev/null"      6 hours ago   Up 6 hours                       test3_long_name_5826139_2403300_72b853b44d0246fd789c89269db742d04a2e5ef9
66c726f4368b   registry2.balena-cloud.com/v2/311177288a80a75e898853b08a8988a5   "/usr/src/app/entry.…"   8 hours ago   Up 8 hours (healthy)             balena_supervisor
root@e874ef2:~# 

and on the test1 container I run the pings:

root@d9fdfd59b3e8:/# ping zookeeper
PING zookeeper (172.19.0.4) 56(84) bytes of data.
64 bytes from zookeeper.466321b2703b452fa4acdfbc5fb7e0c3_default (172.19.0.4): icmp_seq=1 ttl=64 time=0.225 ms
64 bytes from zookeeper.466321b2703b452fa4acdfbc5fb7e0c3_default (172.19.0.4): icmp_seq=2 ttl=64 time=0.196 ms
^C
--- zookeeper ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.196/0.210/0.225/0.020 ms

Note that you can use an expression like balena ps --filter "name=test3" --format "{{.Names}}" | head -c62 to get a shorter name on a script if you need to.

About what’s causing the issue: The container name is created using the following pattern: ${this.serviceName}_${this.imageId}_${this.releaseId}_${this.commit}
In your case, this maps to:
serviceName: zookeeper
imageId: 5826138
releaseId: 2403300
commit: 72b853b44d0246fd789c89269db742d04a2e5ef9 ( 40 chars )
The last two come from balena dashboard

On my test device, the commit is shorter:
5bd64c97eea4199246408b1fe1f63a0a ( 32 chars )

This is why I couldn’t reproduce it on the test device.

Another thing I found while checking the app is that your docker-compose uses the unsupported container_name property. I don’t think it’s causing any issue, but it should be removed.

To summarize:

  • the issue is related to a long container name
  • renaming the container looks like a possible workaround, shortening the name as shown above.

I still need to check internally if renaming may cause any issues when upgrading/redeploying, and dig into the container naming strategy.

Thanks for your help tracking this down!, let us know if the workaround works for your case.

Ramiro

@ramirogm
Thank you very much for the swift feedback. The issue also occurs with the vanilla Ubuntu image (test containers), so the issue is not limited to kafka images. Actually it’s apparently a fixed limit for host names as specified in some RFCs and the Linux kernel (c - What is the maximum number of characters for a host-name in Unix? - Stack Overflow, hostname(7) - Linux manual page).
I checked why your commit UUID is shorter than mine and discovered that commit UUIDs from git push are longer than from balena push. That might be an explanation why the error hasn’t been reported from other users so far (at least I haven’t seen other people mentioning this). I did a balena push to the device resulting in the commit IDs below. (For some reason the supervisor is currently not updating the device, maybe mixing git push and balena push causes issues here? Is that generally safe to use, as it would be the best workaround for us currently I think).

4164398d9f1f425c4a96546f2ba89e66 #(balena push)
72b853b44d0246fd789c89269db742d04a2e5ef9 #(git push)

I looked at the code changes for the line you posted, turns out the issue was introduced by this commit.

To sum it up, I think we noticed the issue because we use git push and a multi container setup requiring network communication of the containers. In general the issue can always happen if you use very long container names. However the commit mentioned above makes hitting that limit a lot more likely.

I think using a shortened commit UUID (e.g. first 8 characters) should be sufficiently safe to avoid name collisions while making this issue way less likely to appear, also for users using balena push with long container names, e.g.:

name: `${this.serviceName}_${this.imageId}_${this.releaseId}_${this.commit.slice(0,8)}`

Alternatively adding dots “.” in the name would work for DNS as specified in the hostname man page:

Each element of the hostname must be from 1 to 63 characters long
and the entire hostname, including the dots, can be at most 253
characters long.

However then the device cannot be reached anymore by the short hostname (e.g. zookeeper). Actually I’m not yet sure where that is defined. The first option by just shortening the name is probably safer in terms of side effects.

Hi @hesch , glad that we’re making progress.

On my previous comment related to Kafka, I was trying to refer to the initial question/problem on the real app about “confluent containers couldn’t connect to each other”. I know that in some cases the kafka components make reverse DNS lookups ( depending on the config ), so my reference was that the issue that we found on the test devices with “ping” could be the same that’s causing that on kafka. Did you have a chance to apply the workaround on the real kafka app?

Good find on the git push vs balena push difference. I’ve also found that difference on the commit and will check with our supervisor team. As you find out the container names are really long by adding the commit id.

Thanks again

Ramiro

Hi @ramirogm, thanks for the great support! I didn’t yet test with our real kafka stack - I’m pretty sure it’ll work but I will test just to sure.

Perfect, let me know if there are any news. If it helps I can also open a git issue with the info we collected here - just let me know.

Hi @hesch

I’ve created supervisor generates invalid DNS names that break reverse DNS lookups · Issue #2077 · balena-os/balena-supervisor · GitHub and will sync with the supervisor team on it. I’ll report back here on our findings.

Thanks again, Ramiro

Hi @hesch,

Checking in here – I read through the GitHub issue and the PR (you’ll see some updates in both), and it looks like the container name length is not the culprit. When creating services, the Supervisor adds the service name as an alias for each service it creates, which is recognized when resolving containers into their IPs. The service name here, zookeeper, is not over 63 chars long, so even if the container name itself were over 63 chars, the resolver should be able to use the service name without issue. Here is the source code for that functionality: balena-supervisor/service.ts at 96418d55b5507d8352e612d4a8074e700ea0780a · balena-os/balena-supervisor · GitHub

If you inspect the zookeeper container, you should see zookeeper as an alias in the inspect JSON. If this isn’t the case, please let us know.

I’ll take a look at your docker-compose reproduction to see if there’s anything weird.

Thanks,
Christina

In your compose reproduction, after I changed the ubi-minimal image to the same balenalib bionic image, I was able to ping zookeeper from the other services.

version: '2.3'

services:
  test1:
    image: balenalib/intel-nuc-ubuntu:bionic
    entrypoint: ["tail", "-f", "/dev/null"]
    stop_signal: SIGKILL
  test2:
    image: balenalib/intel-nuc-ubuntu:bionic
    entrypoint: ["tail", "-f", "/dev/null"]
    stop_signal: SIGKILL
  zookeeper:
    image: balenalib/intel-nuc-ubuntu:bionic
    entrypoint: ["tail", "-f", "/dev/null"]
    stop_signal: SIGKILL

It looks like it’s related to the base image, potentially – what do you know about the networking related layer of the ubi-minimal image? That sounds like a good area for investigation.

Note that container_name is an ignored field so won’t have any effect when pushing to balena.