Network problems and multiple IP addresses

Hi!

One of my units appears with three (!) IP addresses in Balena Dashboard (and in the CLI, the SDK, and in the supervisor API). Only one of them is valid, though. This is a problem, because I’m not able to ping myself using $HOSTNAME.local, which in turn is a problem because it prevents our ROS application from starting up. (Using localhost as ROS_HOSTNAME is not an option, since this is a distributed application.) At least, I suspect that this is the cause of the problem. (All other units show up with a single IP address, and none of them have network problems.)

Using BalenaOS v2.50.1+rev1, supervisor v11.4.10.

I can pipe curl -sX GET --header "Content-Type:application/json" "$BALENA_SUPERVISOR_ADDRESS/v1/device?apikey=$BALENA_SUPERVISOR_API_KEY" to a Python snippet that extracts the IP addresses, loops over them, chooses the first one that also appears in ifconfig -a, and modifies /etc/hosts with something sensible, but that feels like an enormous hack which I would not like to have in a production system.

Hi. Could you enable support access to the device and share the device uuid in a private message so we could take a look?

Looking into it and will come back to you

1 Like

Can you extend the support access for more please? We would need more time and you can revoke it whenever you feel like.

Sure, I’ll fix it right away!

One other question, the other devices which you mentioned are not having this issue, are they running the same balenaOS version?

No, they are running various ResinOS 2.13.x version, most of them with supervisor v7.11.0.

Hi, I tried to flush the extra IP addresses and the machine went offline. Can you unplug and plug it back again please?
Also, what hw is this? Are you doing something in your container wrt to manipulating the IP addresses in any way?

Hi, @floion, the machine should be online now. The units are equipped with SuperMicro motherboards (https://www.supermicro.com/en/products/motherboard/X10SDV-6C-TLN4F). We add an ethernet bridge in the container, however I’ve tried to remove that script without changes in behaviour. For your information, the scripts look like this:

start_bridge.sh:

#!/bin/sh
  
if [ -z $BRIDGE_NAME ]; then
        BRIDGE_NAME=cbl
fi

if [ ! -d /sys/class/net/$BRIDGE_NAME/bridge ]; then
        brctl addbr $BRIDGE_NAME
fi
iptables -A FORWARD -i $BRIDGE_NAME -j ACCEPT
iptables -A FORWARD -o $BRIDGE_NAME -j ACCEPT
ifconfig $BRIDGE_NAME up
python3 monitor_bridge.py &
sleep 10
dhclient $BRIDGE_NAME

monitor_bridge.py:

#!/usr/bin/env python3
  
import os
import subprocess
import time

bridge_name = str(os.environ.get("BRIDGE_NAME", "cbl"))
sleep_period = int(os.environ.get("BRIDGE_SLEEP_PERIOD", 300))

def get_interfaces():
    sys_class_net = os.listdir("/sys/class/net/")
    return filter(lambda s: s.startswith("en") or s.startswith("eth"), sys_class_net)

while True:
    interfaces = set(get_interfaces())

    # Find interfaces that are already brigded.
    bridges = subprocess.check_output(["bridge", "link"])
    bridges = set(e.split(" ")[1] for e in bridges.decode("utf-8").strip().split("\n") if e)

    # Add missing interfaces to the bridge.
    for eth in interfaces - bridges:
        subprocess.check_call(["ip", "addr", "flush", "dev", eth])
        subprocess.check_call(["ifconfig", eth, "0.0.0.0", "down"])
        subprocess.check_call(["brctl", "addif", bridge_name, eth])
        subprocess.check_call(["ifconfig", eth, "up"])

    time.sleep(sleep_period)

Hi, given your usecase, a quick and dirty way of eliminating any issues your app may be introducing would be to create a new empty app in the dashboard, move your device in that app and then reboot the board and check if you still see the multiple ip addresses. Can you try that?

Moving device now; currently downloading the docker image; will keep you posted.
UPDATE: I’ve moved the device to a new application, but the unit is still unable to ping itself.

BTW, this morning I noticed that there was only a single IP address visible in Balena. However, the problem persisted! So my assumption in the original post was wrong; there is no connection between the network problems and the multiple IP addresses.

Hi, so the device has been moved to a new app? Can you add me as collaborator to that app? My username is g_florin_ionut. Can you confirm that I can then push any code to that app so I can test some things? I will also be rebooting the device and make some changes to it so want to let you know that.

Hi, @floion. The device has been moved to the new app and you’ve been added as a developer. You should be able to push code to the app now.

Note: If you reboot the device, you will have to notify me so I can physically turn it back on again.

Thanks. Why is that? Is there a problem with reboot in this hardware?

No, I believe it’s only a configuration issue. I’ll ask the technician who did the installation. Will keep you posted.

I just pushed a simple test app to the app. Can you do some reboots and check after each reboot how many IP addresses get listed? At this moment I only see 1.

I’ll quote myself from earlier today:

BTW, this morning I noticed that there was only a single IP address visible in Balena. However, the problem persisted! So my assumption in the original post was wrong; there is no connection between the network problems and the multiple IP addresses.

The problem is not caused by multiple IP addresses; it was an erroneous assumption of my in my first post. You will see that the network problems described also exists in your newly pushed app:

root@f3ff012:/usr/src/app# ping $HOSTNAME
ping: f3ff012: No address associated with hostname
root@f3ff012:/usr/src/app# ping $HOSTNAME.local
ping: f3ff012.local: Name or service not known
root@f3ff012:/usr/src/app# 

If you still want to reboot the device, I will be able to assist you on Monday morning CET, as I’m out of office this weekend.

Right, sorry about that. Missed that message. Will keep you posted

1 Like

Thanks!

Hi, you could use the hostname field in your docker-compose for you containers. This page https://www.balena.io/docs/reference/supervisor/docker-compose/ shows the supported fields by the way. I just tested on your machine and was able to ping. Please let us know how that goes