Single Image Multi Container Best Practices?

I’m looking for any recommendations or wisdom around running multiple containers from a single image. The use case is this, I’m trying to process multiple independent streams of data on a single device, each requires slightly different configuration. I could have run this all single-process or multi-process-single-container, but the cleanliness of having each process containerized is very appealing as the logs are segregated and seems to play well with the control plane that balena offers, for example, starting and shutting some services independently and exposing that status in the dashboard.

But I have run into a few catches that make this rough.

Design
I’m using the multi-service approach in balena. There is no way to dynamically create services in balena. So I cannot simply scale the number of containers based on runtime configuration. This would be desirable in my case but isn’t possible currently. What I have resorted to, is creating a large number of services greater than I expect to use, in a docker-compose file and then idle/shutdown the unused ones. Does anyone have any better ideas around this?

Build Time
Things get interesting here. If I “balena push” a docker compose with say, 10 services, all with same “build” stanza in docker-compose, the builder tries to build the same image 10 times in parallel. This is impossibly long for local mode on a pi4 and I image I’d be abusing balena’s build servers, especially if I scale to 20 or more containers. If I modify the docker compose to have one service, “balena push”, then add them back in docker-compose, and use the cache, it works, but I still have a time bomb of dozens of builds starting if I make a cache-invalidating change to the service.

I’ve resorted to completely side stepping the balena commands to build the release, just building my image with “docker build” and pushing to a docker hub repo, and then just using that pre-built image in each service’s “image” stanza. This works just fine, but because I’m a bit outside the balena ecosystem there is a bit of friction as I can’t use things like Dockerfile.template and can’t use the balena build servers for native builds, only putting together the docker-compose. Which also runs very long for not doing any building as it seems to calculate diffs for each identical image.

Any ideas on how to do this better?

Run Time
The services check if they have a device-service configuration, if not they “sleep”. Just sleeping works ok, but I dislike that I can’t simply see the services that are shutdown as off in the service list in balena dashboard. So, I tried to make them self-destruct via the supervisor api. However, I get some pretty wild results. If the service calls the stop-service api in the supervisor, it starts bouncing in a restart loop as balena keeps trying to restart the service. With 10 or more services this is totally unusable with several containers constantly bouncing and using resources. If I set the docker-compose restart stanza to “unless-stopped” this seems to stop the bouncing madness but then containers that have actually failed stop forever. The goal is that I want service that self shutdown to stay down and everything else to bounce back to running on failure. Is this possible in balena currently?

Hi @anton-ceai ,

Welcome to the forums! This is a great architecture question for sure. Before I give it a little more thought and provide another point of view:

Did you give the on-failure restart policy a shot? If I read that correctly, I am assuming that is the behavior you are seeking correct?

Thanks for taking some time @nucleardreamer.

I set up a clean environment and with some random sleep delays in my startup script, and with “on-failure”, I’m getting reasonable restart behavior with 20 services.

Hello @anton-ceai,
we are working on (no ETA) being able to have optional containers, but your case of the exact some container, just with different configuration is something that we currently don’t support nicely.

You already seem to have tested all options and know the tradoffs. I personally would probably just go with a single container, which would auto-generate supervisorD configuration for each service on startup and then run it. Logs could be prefixed for each service so you can easily separate them on the dashboard.

I’m surprised about your observations about deltas though. Is it really the case that for 10 images deltas take 10 times as long? The expected behavior is that (because the contents are identical) that delta is only calculated once, but would still block the download for the other 9.
What might be causing this, is if the builds don’t produce identical results. This might be something as simple as different timestamps… In that case, yes, the best thing you can do is reference an external image

If I were a bit more handy with init-ish stuff like supervisord I think it would have been the natural route but I also couldn’t resist a less-code solution. After a bit more work, things are working out better.

By including some random sleeps in the container startup commands, I avoid the thundering herd issue of all the services starting at the same time, and with an “on-failure” restart behavior in the docker-compose, I get maybe one or two “bounce” cycles of starts and shutdowns of the containers which is acceptable for my use case. Then the system settles into my desired state, all active containers up and all inactive ones shutdown. It’s actually quite easy to eyeball the status of the device or pull that status from balena api (/v2/applications/state).

The dashboard UI doesn’t play nicely with a list of 20 services, (I run out of space on screen when I select a terminal target"), but cli/api works.

The builds seem awkward, but I can’t really tell if anything is actually not working correctly. It takes about 6 minutes to deploy 20 identical pre-existing images. I do get this error:

[Info]             Generating image deltas from release a3fad34604993ddfd63c089e4493cebc (id: 1887248)
[Warning]          Failed to generate deltas due to an internal error; will be generated on-demand

I’ll include the deploy log if anyone is interested. multi-image-output.log (222.5 KB)

For now, I’m going to stick with Multi-Container with shutdown inactive containers, and see how far I can take it.