Healthcheck implementation

wzhou · February 11, 2021, 11:18pm

Has anyone have experience implementing a healthcheck using Python script exit code? I wrote a python script that after certain time, sys.exit(0) or sys.exit(1). I was expecting the healthcheck can use it to determine the health of container… But it didn’t work properly. The condition is always starting… then restart… Does anyone have experience?

gelbal · February 12, 2021, 3:52pm

Hey @wzhou first of all welcome to the balena community!

Can you please expand on your use case?

You could utilize the same named Docker compose file’s configuration option perhaps:

We have a Javascript example to demo this capability on a balena device:

Additionally, the device supervisor has the following endpoint that you might be interested in utilizing instead of building such logic yourself:

I wanted to give some general direction. Then let us know if you have any specific questions.

Cheers…

wzhou · February 14, 2021, 10:51pm

Hi @gelbal , thanks for your quick reply!
I have an acquisition container running a camera capture pipeline. For the healthcheck, I put this in my docker compose (2.1) file for this container:

healthcheck:
  test: ["CMD", "curl", "-f", "http://localhost"]
  interval: 1m30s
  timeout: 10s
  retries: 3

However, when I start the container, it always returns like Up 4 minutes (health: starting)
Then after some certain time, it restarts the container, like Up 36 seconds (health: starting) So it basically never in a healthy condition, but I can assure the camera capture is running fine (I got images as output for that container)…
my resin_supervisor looks fine, it has
Up 8 minutes (healthy) resin_supervisor
I thought this is a simple standard health check, but not sure why it’s not returning a healthy condition?
Thanks…

gelbal · February 15, 2021, 2:44pm

Hi @wzhou, before you implement this healthcheck, I advise you to take a step to think through these questions:

How would I verify that the service is running fine? How could I reason that the camera is working as expected?
Can I ping the open ports of the service? Or does this service write its status (running fine, broken) to the filesystem?
How can I ping the open ports? Or how can I read the service status update?

In the following example project, there is a web server running. It has a port exposed for querying and furthermore there is a specific endpoint to query to check status (localhost:8080/status).

The snippet you pasted implies that if you cannot run curl on localhost for some reason or if you get a non successful response, the service is not healthy. Is it the actual case in your setup?

In your case, perhaps you don’t have a web server running. Instead maybe you could update your service running the camera to do a healthcheck itself and write fine or failure on a text file every X minutes. Then your healthcheck becomes a script reading this status update every X minutes and exiting when it reads failure.

Let us know if these help or not. I encourage you to ask the above questions and design your healthcheck accordingly. I also suggest checking Device Diagnostics menu and run checks there to make sure the device is indeed in a healthy state.

If you are still experiencing issues, it’d be great if you could post a simplified / sample dockerfile of your application to illustrate the issue.

Cheers…

wzhou · February 15, 2021, 9:06pm

Hi @gelbal ,

It’s my bad, my container was actually not exposed to any port so not able to check it…I’ve changed it and it works now!

If you don’t mind I ask another question, what is a (health: starting) state for health check? I set my healthcheck interval to 10s, but i noticed it can be a starting state for longer than 10sec… does this time include container starting time and also healthcheck execution time?

Thanks again for your help!

robertgzr · February 16, 2021, 12:15pm

Hi @wzhou, health: starting means the engine hasn’t yet run your health check. So yes you are right it’s container startup time + time until the engine gets around to run the check (usually negligible) + time it takes until the check actually finishes

yeloman · June 2, 2023, 8:02am

Why can’t a simple indicator be implemented like this?
(I’ve just hacked the HTML below for the example)

What about having an indicator beside the “Running” icon if the healthcheck field is defined in the docker-compose.ymlfile?`

Here’s an example:

....
   healthcheck:
      test: "ping -4 -c 1 192.168.30.1 || exit 1"
      interval: 5s
      timeout: 5s
      retries: 10

Topic		Replies	Views
Demo of Docker HEALTHCHECK for a service Project help	1	1820	May 14, 2018
Suggestions for acting on container healthcheck Product support	4	1538	February 25, 2019
Container lifecycle Product support	22	1806	March 16, 2019
Device Health Check by supervisor API? General	6	987	July 28, 2021
Container Watchdog balenaOS	8	930	August 25, 2022

Healthcheck implementation

Related topics