Healthcheck implementation

Has anyone have experience implementing a healthcheck using Python script exit code? I wrote a python script that after certain time, sys.exit(0) or sys.exit(1). I was expecting the healthcheck can use it to determine the health of container… But it didn’t work properly. The condition is always starting… then restart… Does anyone have experience?

Hey @wzhou first of all welcome to the balena community!

Can you please expand on your use case?

You could utilize the same named Docker compose file’s configuration option perhaps:

We have a Javascript example to demo this capability on a balena device:

Additionally, the device supervisor has the following endpoint that you might be interested in utilizing instead of building such logic yourself:

I wanted to give some general direction. Then let us know if you have any specific questions.

Cheers…

Hi @gelbal , thanks for your quick reply!
I have an acquisition container running a camera capture pipeline. For the healthcheck, I put this in my docker compose (2.1) file for this container:

healthcheck:
  test: ["CMD", "curl", "-f", "http://localhost"]
  interval: 1m30s
  timeout: 10s
  retries: 3

However, when I start the container, it always returns like Up 4 minutes (health: starting)
Then after some certain time, it restarts the container, like Up 36 seconds (health: starting) So it basically never in a healthy condition, but I can assure the camera capture is running fine (I got images as output for that container)…
my resin_supervisor looks fine, it has
Up 8 minutes (healthy) resin_supervisor
I thought this is a simple standard health check, but not sure why it’s not returning a healthy condition?
Thanks…

Hi @wzhou, before you implement this healthcheck, I advise you to take a step to think through these questions:

  • How would I verify that the service is running fine? How could I reason that the camera is working as expected?
  • Can I ping the open ports of the service? Or does this service write its status (running fine, broken) to the filesystem?
  • How can I ping the open ports? Or how can I read the service status update?

In the following example project, there is a web server running. It has a port exposed for querying and furthermore there is a specific endpoint to query to check status (localhost:8080/status).

The snippet you pasted implies that if you cannot run curl on localhost for some reason or if you get a non successful response, the service is not healthy. Is it the actual case in your setup?

In your case, perhaps you don’t have a web server running. Instead maybe you could update your service running the camera to do a healthcheck itself and write fine or failure on a text file every X minutes. Then your healthcheck becomes a script reading this status update every X minutes and exiting when it reads failure.

Let us know if these help or not. I encourage you to ask the above questions and design your healthcheck accordingly. I also suggest checking Device Diagnostics menu and run checks there to make sure the device is indeed in a healthy state.

If you are still experiencing issues, it’d be great if you could post a simplified / sample dockerfile of your application to illustrate the issue.

Cheers…

Hi @gelbal ,

It’s my bad, my container was actually not exposed to any port so not able to check it…I’ve changed it and it works now!

If you don’t mind I ask another question, what is a (health: starting) state for health check? I set my healthcheck interval to 10s, but i noticed it can be a starting state for longer than 10sec… does this time include container starting time and also healthcheck execution time?

Thanks again for your help!

Hi @wzhou, health: starting means the engine hasn’t yet run your health check. So yes you are right it’s container startup time + time until the engine gets around to run the check (usually negligible) + time it takes until the check actually finishes