Hey all,
I’m running into an issue with balena scan
not picking up any devices. My project is a multi-container setup on a Raspberry Pi 3 with balenaOS 2.31.5+rev1. The Pi is set to local mode via the dashboard, and balena-cli is on my local machine (Ubuntu 18.04). After running balena scan
(from a root shell) on my local network, it always fails with the error Could not find any balenaOS devices in the local network
.
Debugging:
My development machine and Pi are on the same local network, and I can see the the device (and others) by running avahi-browse --all
(Truncated output:)
+ wlp2s0 IPv6 b35c00b SSH Remote Terminal local
+ wlp2s0 IPv6 8e00622 SSH Remote Terminal local
+ wlp2s0 IPv6 b35c00b SFTP File Transfer local
+ wlp2s0 IPv6 8e00622 SFTP File Transfer local
+ wlp2s0 IPv4 Trace SSH Remote Terminal local
+ wlp2s0 IPv4 Trace SFTP File Transfer local
+ wlp2s0 IPv6 Trace SSH Remote Terminal local
+ wlp2s0 IPv6 Trace SFTP File Transfer local
...
The devices:
b35c00b
= my target Pi in local mode
8e00622
= a balenaFin doing something else
Trace
= a Mac with SSH turned on
This all makes sense, but it’s worth noting that the scan doesn’t show an IPv4 address for the balena devices (despite them being assigned ones), and asking avahi to resolve IPv4 specifically fails:
isaac@C6 # avahi-resolve -4 -n b35c00b.local
Failed to resolve host name 'b35c00b.local': Timeout reached
isaac@C6 # avahi-resolve -n b35c00b.local
b35c00b.local fe80::9274:272f:812b:2cad
isaac@C6 # avahi-resolve -n Trace.local
Trace.local fe80::140d:47a0:7fb:d3d6
isaac@C6 # avahi-resolve -4 -n Trace.local
Trace.local 10.66.99.7
The logs from the host OS on the device match up; I’d be surprised if we could resolve an IPv4 address on the local network:
root@b35c00b:~# journalctl -f -u avahi-daemon
-- Logs begin at Fri 2019-03-01 17:53:19 UTC. --
Jul 13 06:43:47 b35c00b avahi-daemon[800]: Joining mDNS multicast group on interface eth0.IPv4 with address 10.66.99.24.
Jul 13 06:43:47 b35c00b avahi-daemon[800]: New relevant interface eth0.IPv4 for mDNS.
Jul 13 06:43:47 b35c00b avahi-daemon[800]: Registering new address record for 10.66.99.24 on eth0.IPv4.
Jul 13 06:43:48 b35c00b avahi-daemon[800]: Joining mDNS multicast group on interface supervisor0.IPv4 with address 172.17.0.1.
Jul 13 06:43:48 b35c00b avahi-daemon[800]: New relevant interface supervisor0.IPv4 for mDNS.
Jul 13 06:43:48 b35c00b avahi-daemon[800]: Registering new address record for 172.17.0.1 on supervisor0.IPv4.
Jul 13 06:43:48 b35c00b avahi-daemon[800]: Joining mDNS multicast group on interface balena0.IPv4 with address 10.114.101.1.
Jul 13 06:43:48 b35c00b avahi-daemon[800]: New relevant interface balena0.IPv4 for mDNS.
Jul 13 06:43:48 b35c00b avahi-daemon[800]: Registering new address record for 10.114.101.1 on balena0.IPv4.
Jul 13 06:43:49 b35c00b avahi-daemon[800]: Registering new address record for fe80::9274:272f:812b:2cad on eth0.*.
From here there seem to be two general options – ensure that balena-cli can use the IPv6 address (the first thing I looked into) OR try to force balenaOS to only advertise an IPv4 address.
Balena CLI
After getting the balena-cli setup from source, I could see that the IPv6 results seem to come back and are considered valid up until the cli attempts to “ping” the Docker/balena engine (https://github.com/balena-io/balena-cli/blob/e41ea6fb1af916633281e92a8587e763b183470b/lib/actions/scan.coffee#L79).
At this point the IPv6 address is passed in, but it’s missing the interface portion (e.g. wlp2s0
). For sake of debugging I tried appending %wlp2s0
to the address before that call, making the passed address fe80::9274:272f:812b:2cad%wlp2s0
in this case. I was hopeful, but noticed that the address gets corrupted further down. At some point the address is passed to docker-modem (balena-cli -> docker-toolbox -> dockerode -> docker-modem). Unfortunately, a few different URL manipulation calls in docker-modem (https://github.com/apocas/docker-modem/blob/7bacb95c2ea5290a47a8c79f0c3b5d7c0de0aa2f/lib/modem.js#L104) seems to corrupt the address. I believe this is because url.parse in Node.JS could still be broken (https://github.com/nodejs/node-v0.x-archive/pull/9411). A quick check in the Node REPL seems to confirm this would break:
> const url = require("url")
undefined
> url.format({protocol: 'http', hostname: 'fe80::9274:272f:812b:2cad%wlp2s0', port: 2375})
'http://[fe80::9274:272f:812b:2cad%wlp2s0]:2375'
> url.resolve("http://[fe80::9274:272f:812b:2cad%wlp2s0]:2375", "/")
'http:///'
> url.resolve("http://[fe80::9274:272f:812b:2cad]:2375", "/")
'http://[fe80::9274:272f:812b:2cad]:2375/'
> url.parse('fe80::9274:272f:812b:2cad%wlp2s0')
Url {
protocol: 'fe80:',
slashes: null,
auth: null,
host: '',
port: null,
hostname: '',
hash: null,
search: null,
query: null,
pathname: '/:9274:272f:812b:2cad%wlp2s0',
path: '/:9274:272f:812b:2cad%wlp2s0',
href: 'fe80:/:9274:272f:812b:2cad%wlp2s0' }
> url.parse('http://[fe80::9274:272f:812b:2cad%wlp2s0]:2375')
Url {
protocol: 'http:',
slashes: true,
auth: null,
host: '',
port: null,
hostname: '',
hash: null,
search: null,
query: null,
pathname: '/[fe80::9274:272f:812b:2cad%wlp2s0]:2375',
path: '/[fe80::9274:272f:812b:2cad%wlp2s0]:2375',
href: 'http:///[fe80::9274:272f:812b:2cad%wlp2s0]:2375' }
>
There might be a good solution here, and I could be missing something obvious In either case, I decided to see what it would take to get BalenaOS to serve an IPv4 address with Avahi.
As a note, for SSH-ing into the Balena device and as I sanity check I can confirm that both of these commands work:
ssh fe80::9274:272f:812b:2cad%wlp2s0 -p 22222 -l root
nc -vC fe80::2e85:a99b:cc23:706f%wlp2s0 2375
BalenaOS Avahi
For this approach I SSHed into the device to set use-ipv6=no
. Steps I took:
mount -o remount,rw /
- Edited
/etc/avahi/avahi-daemon.conf
withuse-ipv6=no
-
systemctl daemon-reload && systemctl restart avahi-daemon
After this the logs show that it should be advertising the IPV4 address oneth0
, but I wasn’t able to see that from eitheravahi-browse
orbalena scan
.
root@b35c00b:/etc# journalctl -f -u avahi-daemon
-- Logs begin at Fri 2019-03-01 17:53:19 UTC. --
Jul 13 07:46:17 b35c00b avahi-daemon[5015]: New relevant interface eth0.IPv4 for mDNS.
Jul 13 07:46:17 b35c00b avahi-daemon[5015]: Network interface enumeration completed.
Jul 13 07:46:17 b35c00b avahi-daemon[5015]: Registering new address record for 10.114.101.1 on balena0.IPv4.
Jul 13 07:46:17 b35c00b avahi-daemon[5015]: Registering new address record for 172.17.0.1 on supervisor0.IPv4.
Jul 13 07:46:17 b35c00b avahi-daemon[5015]: Registering new address record for fe80::5404:36ff:fe71:4d9 on resin-dns.*.
Jul 13 07:46:17 b35c00b avahi-daemon[5015]: Registering new address record for 10.114.102.1 on resin-dns.IPv4.
Jul 13 07:46:17 b35c00b avahi-daemon[5015]: Registering new address record for fe80::9274:272f:812b:2cad on eth0.*.
Jul 13 07:46:17 b35c00b avahi-daemon[5015]: Registering new address record for 10.66.99.24 on eth0.IPv4.
Jul 13 07:46:18 b35c00b avahi-daemon[5015]: Server startup complete. Host name is b35c00b.local. Local service cookie is 1558821072.
Jul 13 07:46:19 b35c00b avahi-daemon[5015]: Service "b35c00b" (/services/ssh.service) successfully established.
I’m not sure if this is because of aggressive caching (I did restart avahi on my local machine for good measure) or because of something else.
Wrap Up
This issue has been interesting to debug, but I’m not sure where to go from here. I’m certain I could work around this issue entirely (e.g. setup a different network w/o IPV6 support) but this seems like a bug that could be worth fixing. Looking forward to any thoughts or suggestions.
Cheers
-Isaac