Service is already stopped (raspberry pi 2)

I have a device on which I’ve been putting together a multicontainer dns and dhcp server. All has been going just fine, but now one fo the containers seems to be caught in some sort of loop and can’t start/stop:

05.09.20 00:59:00 (+1000) Installing service ‘dnscache2 sha256:74bb0e0eaee208b591bd1e30ebb19bb38b35ac0084cbebd8bf875194a6bcba76’
05.09.20 00:59:17 (+1000) Killing service ‘dnscache2 sha256:74bb0e0eaee208b591bd1e30ebb19bb38b35ac0084cbebd8bf875194a6bcba76’
05.09.20 00:59:17 (+1000) Service is already stopped, removing container ‘dnscache2 sha256:74bb0e0eaee208b591bd1e30ebb19bb38b35ac0084cbebd8bf875194a6bcba76’
05.09.20 00:59:17 (+1000) Killed service ‘dnscache2 sha256:74bb0e0eaee208b591bd1e30ebb19bb38b35ac0084cbebd8bf875194a6bcba76’
05.09.20 00:59:17 (+1000) Installing service ‘dnscache2 sha256:74bb0e0eaee208b591bd1e30ebb19bb38b35ac0084cbebd8bf875194a6bcba76’

The service is called dnscache2 because I’ve just tried renaming it to see if there is some cruft that is not being cleaned up, but this newly named version has never successfully started.

I’m not sure where to start in trying to understand what is failing, so any advice very welcome!

Hi Liam. Perhaps dnscache is crashing on startup. I suggest using ssh to access the device and run journalctl -fa to view all the logs that are occurring while the container is in the restart loop.

You can ssh from the dashboard or the balena cli. https://www.balena.io/docs/learn/manage/ssh-access/#ssh-access

All host and container logs are written to the systemd Journal which you can access via journalctl more info here: https://www.loggly.com/ultimate-guide/using-journalctl/

@codewithcheese Thanks for the pointers - exactly what I was looking for.

So, what happens when I start the service is that it starts up fine and runs for a few minutes happily. Then it realizes there is a network config change that it needs to make:

Sep 07 08:04:49 c6327f8 78f4fd5e9a25[785]: [debug]   Replacing container for service dnscache because of config changes:
Sep 07 08:04:49 c6327f8 78f4fd5e9a25[785]: [debug]     Network changes detected

After a while we get to this (removed some duplicate lines from resin-supervisor):

Sep 07 08:05:02 c6327f8 balenad[785]: time="2020-09-07T08:05:02.158458734Z" level=info msg="shim reaped" id=6e654f723795a56e9adc5da181ed5fe6b0d84c099db5e0c785d9494d4874f78f
Sep 07 08:05:02 c6327f8 balenad[785]: time="2020-09-07T08:05:02.164912034Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Sep 07 08:05:04 c6327f8 78f4fd5e9a25[785]: [event]   Event: Service exit {"service":{"appId":1725022,"serviceId":681292,"serviceName":"dnscache","releaseId":1521939}}
Sep 07 08:05:04 c6327f8 78f4fd5e9a25[785]: [event]   Event: Service stop {"service":{"appId":1725022,"serviceId":681292,"serviceName":"dnscache","releaseId":1521939}}

The first two I’m not sure I understand, but the other two make sense.

Then:

Sep 07 08:05:05 c6327f8 78f4fd5e9a25[785]: [event]   Event: Service install {"service":{"appId":1725022,"serviceId":681292,"serviceName":"dnscache","releaseId":1521939}}                                                                           
Sep 07 08:05:06 c6327f8 78f4fd5e9a25[785]: [error]   Scheduling another update attempt in 900000ms due to failure:  Error: Failed to apply state transition steps. (HTTP code 400) unexpected - container sharing network namespace with another container or host cannot be connected to any other network  Steps:["start"]

… and a callstack.

So I think this identifies my problem, although I’m not sure how to handle it.

I’m trying to create two services, one is the dns cache we’ve been talking about, and the other is a local authoratitive dns server (tinydns). Usually these are often installed together, with tinydns listening on 127.0.0.1 and the cache listening to the outside network and relaying any relevant requests to it.

So I’m uncertain how I can set that up here, with two separate containers/services. The cache can certainly sit there, listening for requests and fulfilling them. But I am uncertain how it can communicate with the tinydns container, as it works only with ip addresses, and I don’t know the address of the other service.

I’ve tried to deal with that in this way (just showing relevant lines from the docker_compose.yml):

   tinydns:
      expose:
         - 53/udp
      networks:
         internal:
            ipv4_address: 10.0.0.2

   dnscache:
      network_mode: host
      ports:
         - "53:53/udp"
      networks:
         internal:
            ipv4_address: 10.0.0.3

networks:
   internal:
      driver: bridge
      ipam:
         config:
            - subnet: 10.0.0.0/24

Clearly, that’s a problem. The cache can’t be part of the bridged network and also part of the host network. I can sort of understand that, but then how can I reach the tinydns service, by IP address, and also talk to the outside network? Have I made this over-complicated? I feel I’m missing what should be an obvious bit of understanding about networking between services/containers. But I’ve only seen them referred to by service_name:port, that I can recall, and that won’t work in the dns cache config files.

Any advice most welcome!

Hi
Can you also provide us the balenaOS version that you are using and the relevant pieces of your previously working docker-compose?

Kind regards,
Thodoris

Just as a way to make sure that this is not an issue with moving between the old an the newer release, you can try moving the device to a new empty application, wait for it to stop all containers and then push the new release.

Kind regards,
Thodoris

Hi again,
According to our docs, we only support network names in the docker-compose.yml networks field, but let me clarify that with the respective internal team and get back to you.
Let me also point you to the respective documentation page:

Kind regards,
Thodoris

Hi there,

If I understand your use-case correctly, you have two services, one of which requires some public network traffic and some private traffic, and a second service that should only listen for private traffic (i.e., originating on the same device).

I think what you can do is define both services to use bridged networking, and allow whatever ports are required for your publicly-available service via a ports: mapping in your docker-compose.yml. We have a masterclass available here which explains this networking setup in more detail: https://www.balena.io/docs/learn/more/masterclasses/services-masterclass/#4-networking-types. Once all services are using a bridge interface, balenaEngine will provide a resolver that does the service name lookup (back to an IP).

I hope I’ve understood your use-case, and I hope this helps! Please let us know if you have any further questions.

This is what I see as my problem. The dnscache service doesn’t have the option of resolving a host name in its config files - it requires that I give it an IP address. Is there a way that I can carry out that lookup on the command line? With a known IP address I could update the configuration files prior to running dnscache; in fact I’m doing that at the moment with the 10.0.0.2 address I’m trying to assign.

The examples I’ve seen all rely on a name lookup (eg “request(‘http://backend:1234/data’…”).

And since I’m implementing a dns with the service, I’m a little unsure about how I query a name resolution that will occur purely within the balena device, and not using the dns that I’m setting up, if you see what I mean?

I feel I should be almost there…

@thgreasi - I’m using a Rasperbby Pi 2, which gives me only balenaOS 2.48.0+rev1, as far as I can tell (and supervisor 10.8.0), which is a bit back from the cutting edge.

Looking over all that I have provided, I have been able to get the services all running, by removing the internal network declaration (and ipv4_address) from the dnscache service. But that leaves me unsure about how to get to to the tinydns service with a known IP address, as per my post just above this one.

So, the “service is already stopped” problem is no longer the one that most concerns me. I do think there should have been some sort of more obvious message reaching me in the web interface to indicate that there was a fundamental network problem (or something identified at the push stage), but with the tools I’ve bene shown here it wasn’t that difficult to locate the problem, which is excellent.

Now I have to work out this internal IP address, and I think I’ll be there…

All of the networks configuration in my docker_compose.yml was focused around assigning a known IP address, so if I don’t to assign an address most of that will fall away from the file…

You should be able to use your favorite DNS resolving tool to resolve that IP at startup (like host from bind, nslookup, etc). The IP may change if the second container is restarted, so it may be worthwhile to refresh that value intermittently or use depends_on: in your compose file to make sure both services are restarted at the same time.

You make a great point about better messaging and failing sooner in the process, so I’ve opened https://github.com/balena-io/balena-supervisor/issues/1452 to track making that easier to follow. Thanks for the feedback, and feel free to subscribe to updates on that thread!

Please let us know if you run into any other issues, or if anything else is unclear.

This isn’t working as I would expect.

Using roughly this structure in my docker_compose.yml:

   tinydns:
      expose:
         - 53/udp

   dnscache:
      network_mode: host
      ports:
         - "53:53/udp"

Both of the containers are using Alpine linux.

I ssh in to the dnscache instance and run:

bash-5.0# nslookup tinydns
Server:         10.114.102.1
Address:        10.114.102.1:53

Non-authoritative answer:
*** Can't find tinydns: No answer

I can request the IP address of a public server (twitter or something) and the nslookup does return a sensible response, by the way, so whatever it is talking to is there, and functional.

The more I think about it, the more I’m uncertain how this could work. In order to get the dnscache to appear to the network on port 53, I need to use host networking for the service. That redirects port 53 to the container, right? So when the container asks the IP address of tinydns is, it is issuing the question to port 53 in the parent, which is redirecting port 53 back to the container, surely? Or should the server address that it is using avoid the redirection of port 53 needed for external queries?

Anyway, for whatever reason, it doesn’t seem to want to provide a response.

Any clues? I’m sorry if I’m missing something obvious.

Hi there – I think you should be able to resolve the IP address for the tinydns container, when running in the dnscache container, if you remove the network_mode: hostdirective from thednscache:` stanza of your docker-compose.yml file. This should still allow port 53 to be exposed over the network. However, I suspect you will also need to change the port that tinydns is listening on, in order to avoid conflict – I might suggest port 5300.

The setup you’re after seems pretty close to the example we have in the Masterclass; have a look at this section, and in particular the diagram following the sentence “And here’s a diagram showing what we want to happen:”.

I hope that helps – let us know how this works out for you!

All the best,
Hugh

@saintaardvark, I tried basically that setup a while back, but all of the services wouldn’t come up because of the port conflict, and there isn’t any way to change the port that tinydns is listening on - both dnscache and tinydns are hardwired to 53.

It is seeming that I might have to accept that I can’t configure them as I would prefer, and will have to combine the two into a single service, as I would do on a non-containered installation. So close!

Sorry this did not work for you. Please don’t hesitate to contact us if you have further problems

Just to close this off entirely -

I have managed to get my dns system working. I had to run both tinydns and dnscache in one container, as I expected. Most of my difficulties after making that decision had to do with not realizing there was a config file that was being silently read, and struggling with getting the log output to appear - both problems with the packages, and my understanding of the systems involved.

While this isn’t exactly what I was hoping for at the outset, it is fully functional.

Hey Liam, thanks for getting back to us! I’m glad you got it working, even if it is not how you imagined it. Let us know if you need anything else in the meantime.