We have a situation with a container and one of our applications. The application runs on a Raspberry Pi 2, it uses dt-overlay to send a SIGTERM to our application when a button is hit, we can see that SIGTERM is likely being sent to the hostOS (i am assuming).
However that SIGTERM is never propagated to our application, since the application continues to run for a new seconds after the SIGTERM. As matter of fact, the time it continues to run seems to correlate with the amount of time the SIGKILL is sent…
I was under the impression, that the SIGTERM should be sent to every container… But only to the parent process of that container, that the SIGTERM is NOT propagated to any children processes.
That being the case I removed our program from a wrapping ‘/bin/sh’, since then I have moved the application to be called directly from the Dockerfile. In otherwords, our application is /usr/local/bin/YCS and my last docker command is ‘CMD /usr/local/bin/YCS’.
Anyway, I know this message was quite convoluted, but hopefully it makes some sense…
Hi Rusty,
I have not quite understood how SIGTERM is being sent so you might want to elaborate on that a little.
To check for / react to signals in containers I usually use an entrypoint script like the following:
#!/bin/bash
PIPE=/tmp/mypipe
if [[ ! -f "$PIPE" ]]; then
mkfifo "$PIPE"
fi
while read SIGNAL; do
case "$SIGNAL" in
*EXIT*)
echo "terminating on $SIGNAL" >/dev/stderr
break;;
*)
echo "signal $SIGNAL is unsupported" >/dev/stderr;;
esac
done < "$PIPE"%
Thanks for that Samothx! I will give that a try and see if it helps our problem.
I know my description was terse at best. Basically, we have wired one of the GPIO pins with a big red “shutdown” button. We have setup dt-overlays such that, when that GPIO pin detects a signal change the RPI2 will case the Kernel to generate our SIGTERM signal.
I’m not sure that is much clearer, if not sorry for my lack of communication skills at the moment (i’m dealing with brain fry at the moment :).
Our problem is this: our application controls a large expensive laser, and if something goes wrong the operator needs to hit the big red button where we have to make sure we do certain things to keep the laser from crashing into something, or worse fry something it shouldn’t.
Soooo, we have a hard deadline of 10 seconds to ensure everything is set to a safe state.
At the moment, we are certainly getting a SIGTERM to the HostOS and things begin to shutdown immediately.
However, the containers can still take up to another 20s before they terminate. In our case, the container in question (the most important container to shutdown) will stay operational for another 20s, despite the fact that we are explicitly trying to catch the SIGTERM in the container in question.
Anyway, we need a way to ENSURE a given container will shutdown in less than 10seconds.
Well, I probably made things worse. I’m going to get some rest, and hopefully I will be able to communicate better tomorrow. :).
But if you do have thoughts or questions, please fire away…
It goes into different methods which can be used, and likely the easiest way to test this is to open a shell on the host OS and run balena kill-s SIGTERM <containerId>. This will show you if it’s the signal catching in the container, or the signal generation elsewhere.