mDNS local access to device failing after a bit

Hi,

I’m running multiple containers on a RPIv3 on our network here.

The hostname is set to “mqtt” and when we power up I can ping mqtt.local

After a while (days+) we can no longer access mqtt.local

I’ve remoted in through the OpenBalena instance it is connected to and checked what is running.

I see the avahi-daemon is on a different hostname

  754 avahi     5924 S    avahi-daemon: running [mqtt-39.local]
  770 avahi     5136 S    avahi-daemon: chroot helper
 4892 redsocks  4008 S    avahi-daemon: registering [b356345-80720.local]
 4893 redsocks  3472 S    avahi-daemon: chroot helper

Sure enough if I ping mqtt-39.local I get a response.

Can you point me in the right direction to understand why the daemon is configured with the extra -39 instead of the hostname?

Thanks!

Alex

Interesting - stopping and starting the avahi-daemon service changes the numeric extension

root@mqtt:/etc/systemd/system/avahi-daemon.service.d# ps | grep avahi
31463 redsocks  3724 S    avahi-daemon: running [b356345-28.local]
31464 redsocks  3472 S    avahi-daemon: chroot helper
31710 root      2864 S    grep avahi
root@mqtt:/etc/systemd/system/avahi-daemon.service.d# systemctl start avahi-daemon
Warning: The unit file, source configuration file or drop-ins of avahi-daemon.service changed on disk. Run 'systemctl daemon-reload' to reload units.
root@mqtt:/etc/systemd/system/avahi-daemon.service.d# ps | grep avahi
31463 redsocks  3724 S    avahi-daemon: running [b356345-28.local]
31464 redsocks  3472 S    avahi-daemon: chroot helper
31713 avahi     5268 S    avahi-daemon: registering [mqtt-4.local]
31714 avahi     5136 S    avahi-daemon: chroot helper
31717 root      2864 S    grep avahi
root@mqtt:/etc/systemd/system/avahi-daemon.service.d# systemctl stop avahi-daemon
Warning: The unit file, source configuration file or drop-ins of avahi-daemon.service changed on disk. Run 'systemctl daemon-reload' to reload units.
Warning: Stopping avahi-daemon.service, but it can still be activated by:
  avahi-daemon.socket
root@mqtt:/etc/systemd/system/avahi-daemon.service.d# systemctl start avahi-daemon
Warning: The unit file, source configuration file or drop-ins of avahi-daemon.service changed on disk. Run 'systemctl daemon-reload' to reload units.
root@mqtt:/etc/systemd/system/avahi-daemon.service.d# ps | grep avahi
31463 redsocks  3724 S    avahi-daemon: running [b356345-28.local]
31464 redsocks  3472 S    avahi-daemon: chroot helper
31859 avahi     5268 S    avahi-daemon: registering [mqtt-3.local]
31860 avahi     5136 S    avahi-daemon: chroot helper
31862 root      2864 S    grep avahi
root@mqtt:/etc/systemd/system/avahi-daemon.service.d# systemctl stop avahi-daemon
Warning: The unit file, source configuration file or drop-ins of avahi-daemon.service changed on disk. Run 'systemctl daemon-reload' to reload units.
Warning: Stopping avahi-daemon.service, but it can still be activated by:
  avahi-daemon.socket
root@mqtt:/etc/systemd/system/avahi-daemon.service.d# systemctl start avahi-daemon
Warning: The unit file, source configuration file or drop-ins of avahi-daemon.service changed on disk. Run 'systemctl daemon-reload' to reload units.
root@mqtt:/etc/systemd/system/avahi-daemon.service.d# ps | grep avahi
31463 redsocks  3724 R    avahi-daemon: running [b356345-28.local]
31464 redsocks  3472 S    avahi-daemon: chroot helper
31890 avahi     5268 R    avahi-daemon: registering [mqtt-3.local]
31891 avahi     5136 S    avahi-daemon: chroot helper
31893 root      2864 S    grep avahi

Googling… Could this be related to IPv6 ?

OK I am seeing a lot of host name conflict messages in the logs. The number is increasing and increasing. There aren’t all these hosts on the network. Very strange

Aug 27 13:08:34 mqtt avahi-daemon[31713]: Host name conflict, retrying with mqtt-28

This hostname counting is a strange bug we have encountered before but haven’t managed to successfully resolve/tackle yet. Its a flakey one.

There is some more information in https://github.com/balena-os/meta-balena/issues/1287 as well.

Can you please provide as much detail about your setup as you can?
Especially how you are setting the hostname

Regards
ZubairLK

Happy to - It’s still in this state here (I’ve stopped and started the service but not rebooted the box).

Hostname should be set in the config.json, a snippet is:

"registered_at":1555934873742,"deviceId":4,"hostname":"mqtt"}

if I run hostname

root@mqtt:/etc/avahi# hostname
mqtt

if I run ps | grep avahi

root@mqtt:/etc/avahi# ps | grep avahi
31463 redsocks  3856 S    avahi-daemon: running [b356345-28.local]
31464 redsocks  3472 S    avahi-daemon: chroot helper
32000 avahi     5400 S    avahi-daemon: registering [mqtt-48.local]
32001 avahi     5136 S    avahi-daemon: chroot helper
32369 root      2864 S    grep avahi

Now that -48 has increased since I restarted the service some minutes ago (!)

If I look at the logs

root@mqtt:/etc/avahi# journalctl | grep mqtt-
Aug 27 13:14:08 mqtt avahi-daemon[32000]: Host name conflict, retrying with mqtt-18
Aug 27 13:14:28 mqtt avahi-daemon[32000]: Host name conflict, retrying with mqtt-19
Aug 27 13:14:48 mqtt avahi-daemon[32000]: Host name conflict, retrying with mqtt-20
Aug 27 13:15:08 mqtt avahi-daemon[32000]: Host name conflict, retrying with mqtt-21
Aug 27 13:15:28 mqtt avahi-daemon[32000]: Host name conflict, retrying with mqtt-22
Aug 27 13:15:48 mqtt avahi-daemon[32000]: Host name conflict, retrying with mqtt-23
Aug 27 13:16:08 mqtt avahi-daemon[32000]: Host name conflict, retrying with mqtt-24
Aug 27 13:16:28 mqtt avahi-daemon[32000]: Host name conflict, retrying with mqtt-25
Aug 27 13:16:48 mqtt avahi-daemon[32000]: Host name conflict, retrying with mqtt-26
Aug 27 13:17:08 mqtt avahi-daemon[32000]: Host name conflict, retrying with mqtt-27
Aug 27 13:17:28 mqtt avahi-daemon[32000]: Host name conflict, retrying with mqtt-28
Aug 27 13:17:48 mqtt avahi-daemon[32000]: Host name conflict, retrying with mqtt-29
Aug 27 13:18:08 mqtt avahi-daemon[32000]: Host name conflict, retrying with mqtt-30
Aug 27 13:18:28 mqtt avahi-daemon[32000]: Host name conflict, retrying with mqtt-31
Aug 27 13:18:48 mqtt avahi-daemon[32000]: Host name conflict, retrying with mqtt-32
Aug 27 13:19:08 mqtt avahi-daemon[32000]: Host name conflict, retrying with mqtt-33
Aug 27 13:19:28 mqtt avahi-daemon[32000]: Host name conflict, retrying with mqtt-34
Aug 27 13:19:48 mqtt avahi-daemon[32000]: Host name conflict, retrying with mqtt-35
Aug 27 13:20:08 mqtt avahi-daemon[32000]: Host name conflict, retrying with mqtt-36
Aug 27 13:20:28 mqtt avahi-daemon[32000]: Host name conflict, retrying with mqtt-37
Aug 27 13:20:48 mqtt avahi-daemon[32000]: Host name conflict, retrying with mqtt-38
Aug 27 13:21:08 mqtt avahi-daemon[32000]: Host name conflict, retrying with mqtt-39
Aug 27 13:21:28 mqtt avahi-daemon[32000]: Host name conflict, retrying with mqtt-40
Aug 27 13:21:48 mqtt avahi-daemon[32000]: Host name conflict, retrying with mqtt-41
Aug 27 13:22:08 mqtt avahi-daemon[32000]: Host name conflict, retrying with mqtt-42
Aug 27 13:22:28 mqtt avahi-daemon[32000]: Host name conflict, retrying with mqtt-43
Aug 27 13:22:48 mqtt avahi-daemon[32000]: Host name conflict, retrying with mqtt-44
Aug 27 13:23:08 mqtt avahi-daemon[32000]: Host name conflict, retrying with mqtt-45
Aug 27 13:23:28 mqtt avahi-daemon[32000]: Host name conflict, retrying with mqtt-46
Aug 27 13:23:48 mqtt avahi-daemon[32000]: Host name conflict, retrying with mqtt-47
Aug 27 13:24:08 mqtt avahi-daemon[32000]: Host name conflict, retrying with mqtt-48
Aug 27 13:24:28 mqtt avahi-daemon[32000]: Host name conflict, retrying with mqtt-49
Aug 27 13:24:48 mqtt avahi-daemon[32000]: Host name conflict, retrying with mqtt-50
Aug 27 13:25:08 mqtt avahi-daemon[32000]: Host name conflict, retrying with mqtt-51

It’s really unhappy! :slight_smile:

Anything else I can tell you ?

Hi there,

Would you mind telling us the host OS version of this device?

Thank you!

root@mqtt:/etc/avahi# cat /etc/issue
balenaOS 2.31.5 \n \l

I think we have a misconfiguration on our network which is exposing this.

I think the key problem is that when avahi detects a naming conflict it bumps up the local name.

This is probably not a preferred behaviour if we need the name of the “good” box to remain the same but I can’t see a way of disabling it.

Hi again @ajlennon,

Just to clarify, are you changing the hostname at runtime, or just setting it at boot time?

Hi - sure - no we’re not changing the hostname dynamically. It’s set in the config.json and it isn’t changing. My reading is that Avahi changes the advertised hostname in this way when it detects a conflict and we see it is detecting conflicts in the logs…

Hi @ajlennon one of our engineers has been digging in as he has been hitting this just managing/unmanaging interfaces and it looks to be this issue https://github.com/lathiat/avahi/issues/117 . I think this might lead to us having to patch avahi :confused:

Yeah I agree. I was hoping that wouldn’t be the case @shaunmulligan.

I’m not sure what the spec. says but I would expect that as the user I should be able to determine what the mDNS host to IP mapping actually is without avahi changing it underneath me (whether or not for good reasons).

Might be worth seeing if the maintainers will upstream a patch for a configuration option?

yeah, its a bit of a mess. My colleague @majorz has been looking at this stuff and there might be a fork that fixes this issue, but whether we can convince the core avahi maintainer to include it upsteam is still to be seen :confused:

2 Likes

Bump, needs some traction considering https://github.com/balena-os/meta-balena/issues/1287 already has pending PRs/Commits relating to this, and has done for almost a year!

The patch mentioned here is a backport of the patch that was supposed to fix the issue in 0.8 but it did not fix it in 0.7.
We are still planning to revisit this issue and work on it so please subscribe to the GH issue to get the updates.

Hey. FYI, the accompanying issue has been closed as it is no longer reproducible on balenaOS v2.53. Should be all good!