When using DT Overlay gpio-shutdown to do gpio pin based shutdown - containers don't receive SIGTERM

Hey Nick, thanks for the files and steps that let me easily reproduce this behavior. I’ve been looking into this and need a bit more time but I can provide some additional information for now.

The reboot/restart button on the dashboard will correctly pass a SIGTERM to your docker container because it calls an API on the Supervisor which uses the on device balenaEngine to stop containers. When docker containers are stopped/restarted in this way they are sent a SIGTERM signal by default as mentioned here. So if we SSH to the device and enter balena stop <container-id> we will see the SIGTERM log from the code. Additionally you can see more docs on this in the services masterclass. There’s a piece in there mentioning:

...By default balenaEngine (and Docker) will send the SIGTERM signal to PID 1 (whatever executable is started using CMD...

So, the issue is when shutting down the device via the dt overlay gpio_shutdown, no SIGTERM is sent. That step is not actually required to exhibit this behaviour. If you SSH to the device and enter shutdown or reboot command no SIGTERM signal is caught. This leads me to believe it is not Supervisor related since these steps to produce the issue did not include the Supervisor. I will continue to look into this though because something is up and I’ll see what I can find.

Could you clarify what you saw happen when interacting with the /v1/device endpoint ? It is still available so shouldn’t have any issues.

Lastly, to rule out issues with the Supervisor not passing logs on gpio_shutdown I was SSHing to the device and viewing the container logs directly with balena logs -f <container-id>.

Can you also confirm if your docker container is actually getting SIGTERM signals with the 2.51.1+rev1 OS release ? You mentioned above “The upgrade to 2.51.1+rev1. did work” but you never saw the logs showing graceful shutdown. Using your provided code I cannot get graceful shutdown logs even with latest OS releases.

Yes with the new OS I believe , the SIGTERM is being sent and our shutdown code is running because in our app there is code that executes and on our local device I see the proper messaging on our OLED screen. So code is running , just not 100% sure its executing to completing since I’m blind to the logs.

it’s now just the console logging back to the Balena portal that is left. It’s technically not a functional bug but more of a usability bug with being able to diagnose/debug the application in the field.

Oh one thing to note. This used to work fine. like 6 months ago before before we parked the project for COVID. The GPIO shutdown would properly send SIGTERM and log to the Balena console so I’m a bit at a loss as to what changed. I know we did rev a bunch of stuff when we came back online.

Regardless the behavior is there and would be good to have consistent shutdown models between the supervisor and a external power button.

I’d like to let you know that we are still looking into this issue and we’ll need some extra time to dive deeper.
Thanks for bearing with us,
Georgia

Hi there – I wanted to update you on this. We’re continuing to investigate the issue, thanks in no small part to the info and code you’ve provided that let us duplicate the problem. Thanks for your patience as we get to the root of this.

All the best,
Hugh

Hi, it appears that you’re hitting a limitation of the design of balenaOS here. The problem is that we can’t guarantee that the logs generated from your application upon receiving SIGTERM will make it to our backend. The component responsible for that has it’s own graceful shutdown but would require some re-architecting, along with some changes to the OS to make that possible.

Your best bet, if this is important to you, is to go one of two routes:

a) use the supervisor API for shutting down the device. This way you hand the reigns to it and the shutdown will be initiated only after all your containers have stopped.
b) check journald to verify the code is running as expected. If you enable persistent logging you should find your container logs in the logs of the previous boot, check with journalctl --list-boots and then journalctl -b <boot-id> -eu balena

I’m sorry this took a while but we had to pull in a few team members to get to the bottom of this one :wink: