CLI Scanning Doesn't Support Link-Local Addresses

Hey all,

I’m running into an issue with balena scan not picking up any devices. My project is a multi-container setup on a Raspberry Pi 3 with balenaOS 2.31.5+rev1. The Pi is set to local mode via the dashboard, and balena-cli is on my local machine (Ubuntu 18.04). After running balena scan (from a root shell) on my local network, it always fails with the error Could not find any balenaOS devices in the local network.

Debugging:
My development machine and Pi are on the same local network, and I can see the the device (and others) by running avahi-browse --all
(Truncated output:)

+ wlp2s0 IPv6 b35c00b                                       SSH Remote Terminal  local
+ wlp2s0 IPv6 8e00622                                       SSH Remote Terminal  local
+ wlp2s0 IPv6 b35c00b                                       SFTP File Transfer   local
+ wlp2s0 IPv6 8e00622                                       SFTP File Transfer   local
+ wlp2s0 IPv4 Trace                                         SSH Remote Terminal  local
+ wlp2s0 IPv4 Trace                                         SFTP File Transfer   local
+ wlp2s0 IPv6 Trace                                         SSH Remote Terminal  local
+ wlp2s0 IPv6 Trace                                         SFTP File Transfer   local
...

The devices:
b35c00b = my target Pi in local mode
8e00622 = a balenaFin doing something else
Trace = a Mac with SSH turned on
This all makes sense, but it’s worth noting that the scan doesn’t show an IPv4 address for the balena devices (despite them being assigned ones), and asking avahi to resolve IPv4 specifically fails:

isaac@C6 #  avahi-resolve -4 -n b35c00b.local
Failed to resolve host name 'b35c00b.local': Timeout reached
isaac@C6 #  avahi-resolve -n b35c00b.local
b35c00b.local	fe80::9274:272f:812b:2cad
 isaac@C6 # avahi-resolve -n Trace.local
Trace.local	fe80::140d:47a0:7fb:d3d6
 isaac@C6 # avahi-resolve -4 -n Trace.local
Trace.local	10.66.99.7

The logs from the host OS on the device match up; I’d be surprised if we could resolve an IPv4 address on the local network:

root@b35c00b:~# journalctl -f -u avahi-daemon
-- Logs begin at Fri 2019-03-01 17:53:19 UTC. --
Jul 13 06:43:47 b35c00b avahi-daemon[800]: Joining mDNS multicast group on interface eth0.IPv4 with address 10.66.99.24.
Jul 13 06:43:47 b35c00b avahi-daemon[800]: New relevant interface eth0.IPv4 for mDNS.
Jul 13 06:43:47 b35c00b avahi-daemon[800]: Registering new address record for 10.66.99.24 on eth0.IPv4.
Jul 13 06:43:48 b35c00b avahi-daemon[800]: Joining mDNS multicast group on interface supervisor0.IPv4 with address 172.17.0.1.
Jul 13 06:43:48 b35c00b avahi-daemon[800]: New relevant interface supervisor0.IPv4 for mDNS.
Jul 13 06:43:48 b35c00b avahi-daemon[800]: Registering new address record for 172.17.0.1 on supervisor0.IPv4.
Jul 13 06:43:48 b35c00b avahi-daemon[800]: Joining mDNS multicast group on interface balena0.IPv4 with address 10.114.101.1.
Jul 13 06:43:48 b35c00b avahi-daemon[800]: New relevant interface balena0.IPv4 for mDNS.
Jul 13 06:43:48 b35c00b avahi-daemon[800]: Registering new address record for 10.114.101.1 on balena0.IPv4.
Jul 13 06:43:49 b35c00b avahi-daemon[800]: Registering new address record for fe80::9274:272f:812b:2cad on eth0.*.

From here there seem to be two general options – ensure that balena-cli can use the IPv6 address (the first thing I looked into) OR try to force balenaOS to only advertise an IPv4 address.

Balena CLI
After getting the balena-cli setup from source, I could see that the IPv6 results seem to come back and are considered valid up until the cli attempts to “ping” the Docker/balena engine (https://github.com/balena-io/balena-cli/blob/e41ea6fb1af916633281e92a8587e763b183470b/lib/actions/scan.coffee#L79).
At this point the IPv6 address is passed in, but it’s missing the interface portion (e.g. wlp2s0). For sake of debugging I tried appending %wlp2s0 to the address before that call, making the passed address fe80::9274:272f:812b:2cad%wlp2s0 in this case. I was hopeful, but noticed that the address gets corrupted further down. At some point the address is passed to docker-modem (balena-cli -> docker-toolbox -> dockerode -> docker-modem). Unfortunately, a few different URL manipulation calls in docker-modem (https://github.com/apocas/docker-modem/blob/7bacb95c2ea5290a47a8c79f0c3b5d7c0de0aa2f/lib/modem.js#L104) seems to corrupt the address. I believe this is because url.parse in Node.JS could still be broken (https://github.com/nodejs/node-v0.x-archive/pull/9411). A quick check in the Node REPL seems to confirm this would break:

> const url = require("url")
undefined
> url.format({protocol: 'http', hostname: 'fe80::9274:272f:812b:2cad%wlp2s0', port: 2375})
'http://[fe80::9274:272f:812b:2cad%wlp2s0]:2375'
> url.resolve("http://[fe80::9274:272f:812b:2cad%wlp2s0]:2375", "/")
'http:///'
> url.resolve("http://[fe80::9274:272f:812b:2cad]:2375", "/")
'http://[fe80::9274:272f:812b:2cad]:2375/'
> url.parse('fe80::9274:272f:812b:2cad%wlp2s0')
Url {
  protocol: 'fe80:',
  slashes: null,
  auth: null,
  host: '',
  port: null,
  hostname: '',
  hash: null,
  search: null,
  query: null,
  pathname: '/:9274:272f:812b:2cad%wlp2s0',
  path: '/:9274:272f:812b:2cad%wlp2s0',
  href: 'fe80:/:9274:272f:812b:2cad%wlp2s0' }
> url.parse('http://[fe80::9274:272f:812b:2cad%wlp2s0]:2375')
Url {
  protocol: 'http:',
  slashes: true,
  auth: null,
  host: '',
  port: null,
  hostname: '',
  hash: null,
  search: null,
  query: null,
  pathname: '/[fe80::9274:272f:812b:2cad%wlp2s0]:2375',
  path: '/[fe80::9274:272f:812b:2cad%wlp2s0]:2375',
  href: 'http:///[fe80::9274:272f:812b:2cad%wlp2s0]:2375' }
> 

There might be a good solution here, and I could be missing something obvious :slight_smile: In either case, I decided to see what it would take to get BalenaOS to serve an IPv4 address with Avahi.

As a note, for SSH-ing into the Balena device and as I sanity check I can confirm that both of these commands work:

ssh fe80::9274:272f:812b:2cad%wlp2s0 -p 22222 -l root
nc -vC fe80::2e85:a99b:cc23:706f%wlp2s0 2375

BalenaOS Avahi
For this approach I SSHed into the device to set use-ipv6=no. Steps I took:

  • mount -o remount,rw /
  • Edited /etc/avahi/avahi-daemon.conf with use-ipv6=no
  • systemctl daemon-reload && systemctl restart avahi-daemon
    After this the logs show that it should be advertising the IPV4 address on eth0, but I wasn’t able to see that from either avahi-browse or balena scan.
root@b35c00b:/etc# journalctl -f -u avahi-daemon
-- Logs begin at Fri 2019-03-01 17:53:19 UTC. --
Jul 13 07:46:17 b35c00b avahi-daemon[5015]: New relevant interface eth0.IPv4 for mDNS.
Jul 13 07:46:17 b35c00b avahi-daemon[5015]: Network interface enumeration completed.
Jul 13 07:46:17 b35c00b avahi-daemon[5015]: Registering new address record for 10.114.101.1 on balena0.IPv4.
Jul 13 07:46:17 b35c00b avahi-daemon[5015]: Registering new address record for 172.17.0.1 on supervisor0.IPv4.
Jul 13 07:46:17 b35c00b avahi-daemon[5015]: Registering new address record for fe80::5404:36ff:fe71:4d9 on resin-dns.*.
Jul 13 07:46:17 b35c00b avahi-daemon[5015]: Registering new address record for 10.114.102.1 on resin-dns.IPv4.
Jul 13 07:46:17 b35c00b avahi-daemon[5015]: Registering new address record for fe80::9274:272f:812b:2cad on eth0.*.
Jul 13 07:46:17 b35c00b avahi-daemon[5015]: Registering new address record for 10.66.99.24 on eth0.IPv4.
Jul 13 07:46:18 b35c00b avahi-daemon[5015]: Server startup complete. Host name is b35c00b.local. Local service cookie is 1558821072.
Jul 13 07:46:19 b35c00b avahi-daemon[5015]: Service "b35c00b" (/services/ssh.service) successfully established.

I’m not sure if this is because of aggressive caching (I did restart avahi on my local machine for good measure) or because of something else.

Wrap Up
This issue has been interesting to debug, but I’m not sure where to go from here. I’m certain I could work around this issue entirely (e.g. setup a different network w/o IPV6 support) but this seems like a bug that could be worth fixing. Looking forward to any thoughts or suggestions.

Cheers
-Isaac

Thank you for this thorough investigation! We’ll discuss with the OS team, but it sounds to me like we should improve IPv6 support rather than avoid it (after all, IoT devices are the textbook justification for the transition to IPv6). If the issue is in 3rd-party packages and even Node.js as you seem to have found, it may not be trivial to fix it, but we should still try to find a fix one way or another. For this purpose I have created a CLI issue on GitHub, and quoted a large chunk of your post above: https://github.com/balena-io/balena-cli/issues/1350