Enabling hardware watchdog timer on the Raspberry Pi CM3

Hi all,

I found an article describing how to enable the Raspberry Pi hardware watchdog timer to have devices recover from any potential hardware or kernel lockups:

Other than a Balena post from 2016 at Keeping Your System Running with a Host OS Watchdog and a few mentions on the Balena forum I could not find any clear documentation.

My questions:

  • Is it possible to enable the hardware watchdog timer on recent versions of BalenaOS?
  • Is this in any way bad practice?

We are running a Compute Module 3 using the Raspberry Pi 3 32bit BalenaOS base image.

Thanks!

Bart

What is your use case? balenaOS already utilizes the hardware watchdog as part of a chain of health checks to ensure the device is always available and responsive. If you’re looking to restart your application when it becomes unresponsive, you might be better off looking into docker-compose file’s “healthcheck” directive that balenaOS also supports: Compose file version 2 reference | Docker Documentation

We are sporadically seeing issues with the camera module that we are using on the Pi which looks similar to the one posted here: Experiencing hung raspberry pi while using camera - Arducam. Sometimes this seems to lead to our devices losing connection.

I am already in contact with your colleagues in another support thread trying to get to the root cause, but I figured it would be good to see if there are any measures we can take to at least make the device recover if it gets into this state.

Good to know that balenaOS already utilizes the hardware watchdog. That answers my question. Thanks!

Does a bad “healthcheck” result in the watchdog not being “pet” and therefore result in a reboot?

Is this enabled on all boards with hardware watchdogs, like BeagleBone Black?

Hi, Jason, you can find more about docker-compose health checks here, but basically, you run a script within your container, and if it returns an error or cannot execute, it restarts your container.

It can look something like this:

        healthcheck:
            test: [ "CMD", "pg_isready", "-q", "-d", "${DB_NAME}", "-U", "${DB_USER}" ]
            timeout: 45s
            interval: 10s
            retries: 10

I’d like for it to reset my board, not my container.

The point of a hardware watchdog is that if, for any reason, you don’t perform the action telling the hardware watchdog everything is fine on a regular periodic basis, the board resets.

Why are we resetting containers when the board needs to be reset?

Where is the hardware watchdog help?

Jason,

Watchdog is already implemented and managed by BalenaOS. This utilizes the hardware watchdog to reset the board if the kernel is unresponsive for the given amount of time. Docker also has healthchecks that can restart a container if your container if it fails a given test.

So, there are two separate processes ensuring your application/board are up and running.

Please let me know if anything is unclear.

I want to reboot the system if the container healthchecks are bad, not just restart the container. Is that possible?

You could run a container with the balena socket mapped into it, which would then give you a docker API inside the container. You could write a script/app to periodically run a status check (i.e. docker ps) to check on the health of the running containers and if your conditions for a system reboot are met, issue a reboot from inside the container.