Has anyone have experience implementing a healthcheck using Python script exit code? I wrote a python script that after certain time, sys.exit(0) or sys.exit(1). I was expecting the healthcheck can use it to determine the health of container… But it didn’t work properly. The condition is always starting… then restart… Does anyone have experience?
Hey @wzhou first of all welcome to the balena community!
Can you please expand on your use case?
You could utilize the same named Docker compose file’s configuration option perhaps:
Additionally, the device supervisor has the following endpoint that you might be interested in utilizing instead of building such logic yourself:
I wanted to give some general direction. Then let us know if you have any specific questions.
Hi @gelbal , thanks for your quick reply!
I have an acquisition container running a camera capture pipeline. For the healthcheck, I put this in my docker compose (2.1) file for this container:
healthcheck: test: ["CMD", "curl", "-f", "http://localhost"] interval: 1m30s timeout: 10s retries: 3
However, when I start the container, it always returns like
Up 4 minutes (health: starting)
Then after some certain time, it restarts the container, like
Up 36 seconds (health: starting) So it basically never in a healthy condition, but I can assure the camera capture is running fine (I got images as output for that container)…
my resin_supervisor looks fine, it has
Up 8 minutes (healthy) resin_supervisor
I thought this is a simple standard health check, but not sure why it’s not returning a healthy condition?
Hi @wzhou, before you implement this healthcheck, I advise you to take a step to think through these questions:
- How would I verify that the service is running fine? How could I reason that the camera is working as expected?
- Can I ping the open ports of the service? Or does this service write its status (running fine, broken) to the filesystem?
- How can I ping the open ports? Or how can I read the service status update?
In the following example project, there is a web server running. It has a port exposed for querying and furthermore there is a specific endpoint to query to check status (
The snippet you pasted implies that if you cannot run
localhost for some reason or if you get a non successful response, the service is not healthy. Is it the actual case in your setup?
In your case, perhaps you don’t have a web server running. Instead maybe you could update your service running the camera to do a healthcheck itself and write
failure on a text file every X minutes. Then your
healthcheck becomes a script reading this status update every X minutes and exiting when it reads
Let us know if these help or not. I encourage you to ask the above questions and design your
healthcheck accordingly. I also suggest checking Device Diagnostics menu and run checks there to make sure the device is indeed in a healthy state.
If you are still experiencing issues, it’d be great if you could post a simplified / sample dockerfile of your application to illustrate the issue.
Hi @gelbal ,
It’s my bad, my container was actually not exposed to any port so not able to check it…I’ve changed it and it works now!
If you don’t mind I ask another question, what is a
(health: starting) state for health check? I set my healthcheck interval to 10s, but i noticed it can be a
starting state for longer than 10sec… does this time include container starting time and also healthcheck execution time?
Thanks again for your help!
health: starting means the engine hasn’t yet run your health check. So yes you are right it’s container startup time + time until the engine gets around to run the check (usually negligible) + time it takes until the check actually finishes