Options for simulating/supporting depends_on service_healthy in balenaOS

Hello everyone,

I’m working on getting my project running on balenaOS, for deployment to RPi 5s/NUCs. The project consists of 5 Docker containers in a Docker-compose file that need to be started in a specific order and each container must be considered service_healthy before starting the next container.

According to the balena documentation, depends_on only supports an array form and the service_started condition, not service_healthy.

The issue is that, if I replace service_healthy to service_started, they all pretty much start at the same time. The healthchecks that are in place for all containers are the ones that should be used to actually signify that a container has started. This is because some database migrations need to be applied, several services within each container must be started, etc, so just saying that “hey I launched the JAR file, it’s started” is not enough to ensure that a service is truly started. This issue is made worse by the fact that I am using performant hardware for the task, leading to very fast container startups, basically milliseconds from each other.

Specifically for me, I am trying to start a project that consists of a postgres DB with Timescale, Keycloak IdP, a reverse proxy, and my platform. The issue is that the postgres instance is not ready by the time that keycloak is starting, meaning that some migrations do not actually occur, thus continuing this chain of wrongly timed migrations onto my reverse proxy and my application.

I would like to make minimal changes to any of the containers’ codebase, as I am trying to make this as much of a plug-and-play configuration I can, by providing a docker-compose.yml file. Since this project does have mission-critical data in its database, I also would like to ensure that all services are started when their dependents are healthy, and just restarting containers until everything works does not work.

I’ve seen this issue in the forums come up multiple times so I am imagining that this is an issue not only seen by me. So my question is, how should I handle this? I have some ideas that I’m not sure are the correct ones:

  • Use an additional entrypoint on each container that waits a predetermined amount of seconds before actually running the next entry point. It’s the jankiest solution, and would need to be adjusted for different devices/services according to their performance.
  • Add another container that orchestrates the containers, by running those healthchecks and handling starting and stopping of containers. This would require access to the supervisor for access to container administration, and is also quite a janky solution that would need to be maintained every time the healthcheck of any of the containers would happen to change.
  • Analyze each Docker container in the Docker-compose file, extract the Healthcheck, run it, and then allow startup of the next containers.
  • Write additional code in all projects to ensure the previous service has been started before using it at all. This would require code additions to multiple platforms, and thus force users that may use the docker-compose for Balena to use deprecated versions of all platforms until I/someone actually go back and update the configuration.

As I mentioned I would like this solution to be as plug-and-play as possible, A) to allow ease of mergeability to main, and B) ease of maintenance by people other than me, and C) allow the normal release flow of Balena, to be able to manage and update containers on the fly, without being on-site/manually performing container maintenance.

Thanks for reading, I can’t wait for your responses/ideas!