On the example above, my cloud-interface service, a simple Go application that tries to connect to the local MQTT broker, will use tcp://local-broker:1883 as the broker host endpoint.
The error I get is: Network Error : dial tcp: lookup local-broker on 127.0.0.11:53: no such host
If I deploy the exact same stack on my laptop, everything works as expected.
Hi @nelson
it looks like there might have been an issue in an earlier versions of Balena. What OS version are you experiencing this problem on ?
Regards
Thomas
Just to keep you updated, one of our engineers will be looking at the issue soon (probably this week), and we will inform you once the problem has been investigated and resolved.
I seem to have the same issue with BalenaOS64 for Raspberry pi3. I tried changing the network mode to host, but the error continues.
Any idea how to solve it, albeit temporarily? @nelson
During the setup phase of consul, I get this error
ERROR: 2019/04/07 17:28:36 Get http://edgex-core-consul:8500/v1/agent/self: dial tcp: lookup edgex-core-consul on 127.0.0.11:53: no such host
Thank you for your time guys!
Best,
Odysseas
The fix is already merged so it’s a matter of putting together other fixes to bundle everything in our next OS release. I will tag this forum thread in the relevant issue, so this forum gets pinged when we make the release:
I’d expect the fix to work with bridge mode as well.
Honestly I have not tried to reproduce the issue myself. Then judging from the response from @chrisys, yeah what you explain would work.
Just wanted a second opinion on whether I am diagnosing the problem correctly. My diagnose is that due to bug related to container networking , containers can’t find each other. The docker-compose file is posted above and the suite is the EdgeX IoT Platform.
Thanks everybody for your time!
Proof (from the logs):
10.04.19 10:06:58 (+0300) logging ERROR: 2019/04/10 07:06:58 connection to Consul could not be made: Put http://edgex-core-consul:8500/v1/agent/service/register: dial tcp: lookup edgex-core-consul on 10.114.102.1:53: no such host
10.04.19 10:06:58 (+0300) command ERROR: 2019/04/10 07:06:58 connection to Consul could not be made: Put http://edgex-core-consul:8500/v1/agent/service/register: dial tcp: lookup edgex-core-consul on 10.114.102.1:53: no such host
10.04.19 10:06:58 (+0300) scheduler ERROR: 2019/04/10 07:06:58 connection to Consul could not be made: Put http://edgex-core-consul:8500/v1/agent/service/register: dial tcp: lookup edgex-core-consul on 10.114.102.1:53: no such host
10.04.19 10:06:58 (+0300) export-client ERROR: 2019/04/10 07:06:58 connection to Consul could not be made: Put http://edgex-core-consul:8500/v1/agent/service/register: dial tcp: lookup edgex-core-consul on 10.114.102.1:53: no such host
10.04.19 10:06:58 (+0300) metadata ERROR: 2019/04/10 07:06:58 connection to Consul could not be made: Put http://edgex-core-consul:8500/v1/agent/service/register: dial tcp: lookup edgex-core-consul on 10.114.102.1:53: no such host
10.04.19 10:06:59 (+0300) data ERROR: 2019/04/10 07:06:59 connection to Consul could not be made: Put http://edgex-core-consul:8500/v1/agent/service/register: dial tcp: lookup edgex-core-consul on 10.114.102.1:53: no such host
10.04.19 10:06:59 (+0300) export-distro ERROR: 2019/04/10 07:06:59 connection to Consul could not be made: Put http://edgex-core-consul:8500/v1/agent/service/register: dial tcp: lookup edgex-core-consul on 10.114.102.1:53: no such host
10.04.19 10:06:59 (+0300) notifications ERROR: 2019/04/10 07:06:59 connection to Consul could not be made: Put http://edgex-core-consul:8500/v1/agent/service/register: dial tcp: lookup edgex-core-consul on 10.114.102.1:53: no such host
default networking between containers works fine (at least in my use case: RPi + balenaOS v2.29.2+rev2). The only issue I had, and what seems to be an issue in general, is using custom networks, to create isolated networking groups.
Like @gelbal said, just use the default bridge network. Remove your custom network from the compose file and get all your containers running on the same default network until the fix is available.
I removed the networks bit as well the network mode. My services still can’t find consul, I suspect hostname might also be bugged as well. Because I use different service names and hostnames.
By get all your containers running on the same default network , what do you mean exactly? Just want to be 100% sure.
What do you think?
QUICK UPDATE: The services are discoverable using the service-name in lieu of the hostname in the address.