Container missing shutdown signal

Hello, I know this entry is probably a little hard to follow, but I really need to figure out this problem, so all the help I get is much appreciated!

Hello, I have an application (dockerized component of the bigger balena app) that needs to receive a SIGTERM to shut stuff down in an orderly manner.

Previously, I was wrapping my application: “ABC” in a bash startup script. From suggestion here on the forum, I added code to catch the signal and pass it along to the ABC process.

It does seem to be recieved by the hostOS, it does seem as through something has happened… But, I never saw code trigger. I since have tried running my application with:

CMD /usr/local/bin/ABC
CMD ["/usr/local/bin/ABC"] and i tried
CMD [/usr/local/bin/ABC]

and others, but I still do NOT see the ABC process receive the SIGTERM signal, assuming that if I start ABC without the bash shell ABC would receive the SIGTERM directly.

Nothing I have tried seems to get the SIGTERM to the ABC process.

If I send a SIGTERM directly to the ABC process: 'kill -15 , I do indeed see that it has received the SIGTERM and then terminates properly.

Hi, could you tell us what OS version and base images you are using in your project? It would also be very helpful if you can provide a example of your bash script and minimal example of ABC, a simple repo for production on our side would be great. From memory I think one needs to call exec ABC from the bash script to get it to pass on the signals correctly, but will be able to help more if we can see some code.

Yes, I appologize for the lack of detail.

The system is a RaspberryPI 2.

  • ABC program is a large-ish C++ program.
  • The problem repo is git.balena-cloud.com:gh_rustyradian/radianlaser.git
  • Versions: balenaOS 2.43.0+rev1,
  • Supervisor version: 10.2.2

I can (and will) write a small C program for testing, and add it to that same repository.

  • Rusty Eddy

btw - My boss asked me to ask: How do we ask for support without our exposing our private business details on a public forum?

Thank You!

Please let me know if I left anything out. I will also create a simple C program and continue to experiment.

As a point of reference, I do see in the Dockerfile documentation:

If you want to run your <command> without a shell then you must express the command as a JSON array and give the full path to the executable. This array form is the preferred format of CMD . Any additional parameters must be individually expressed as strings in the array:

FROM ubuntu
CMD ["/usr/bin/wc","--help"]

Since I would like to receive the signal directly, without having to catch and relay with a bash script. I believe with the following form:

CMD ["/usr/local/bin/ABC"]

Which I have tried prior to opening this thread…

Hi,

Thanks for the additional information. Unfortunately, I can’t get access to that git repo you provided to view the code (and just in case you haven’t seen the warning on this page https://www.balena.io/docs/learn/deploy/deployment/#git-push “The balenaCloud git repository is not intended as a code hosting solution”). Perhaps you could add the sample script to a gist or similar and share the link?

As for sharing private details via the forum, you can do this via the DM function on the forum assuming you get the permission of the support agent first (this is just so they are aware and we don’t inadvertently lose track of any support messages).

Well, I am trying to get the signal directly to the binary C program, so in this specific case now, I do not have a script. Just calling the binary directly:

CMD ["./ABC"]

When I do this, I can see my binary C program running in the container with PROCID 1.

A different But related case:

In earlier attempts I was wrapping my above binary C program with a shell script, attempting to capture and forward the signal, this what my start script look like:

CMD ["./start.sh"]

start.sh:

...
_term() {
	echo "Caught SIGTERM signal!" >> /radianfs/log/ycs-startup.log
	kill -TERM "$child" 2>/dev/null
}

trap _term SIGTERM

echo "Starting up ABC." >> /radianfs/log/abc-startup.log
./ABC  &

child=$!
echo "waiting for child procid ${child}" >> /radianfs/log/abc-startup.log
wait "$child"

echo "Recieved Child has passed." >> /radianfs/log/abc-finished.log

Of course, in this case, the bash shell does not appear to get the signal…

Different Code Different Repo

I actually did read the bit about “balenaCloud” not being a intended as a code hosting solution. Well I’m not using it as a code hosting solution, I have my own. What is not explained in the documentation is whether that is persistent or not, nor whether you have access, so I not know how y’all have it setup.

So, sorry. I will go ahead, create a new repo, build, load and so on new code. But even then, I am afraid that I am not actually debugging MY particular issue.

Hi,

Regarding persistence, it says in our docs that “we cannot guarantee the persistence of data in balenaCloud git remotes”. It is also a deprecated method, replaced by balena push.

How are you stopping the containers to test if SIGTERM is raised?

In addition, in regards to the bash script, have you tried the workaround posted by Shaun above namely exec ABC from the bash script to get it to pass on the signals correctly?

Yep I Read that. As I stated before, I never assumed I could use it for persistent storage. That still did not inform me that you all do not have access to it, nor that it necessarily goes away immediately (which I ASSume is what happens, now that you have put this energy into informing me of what the documentation says).

Just so we are clear, I have read almost every one of your public docs, including the one you have quoted. That did NOT address my original statement.

Just so we are clear: I UNDERSTAND THAT BALENA EMPLOYESS DO NOT HAVE ACCESS TO THE BALENA git repos.

GOT IT!

Anyway, Moving on:

I am stopping the container by pushing a button connected to a GPIO pin. I am using dt_overlays to program the RPI kernel to begin the shutdown process, including send SIGTERM signals to every container.

Nope I have not tried the EXEC method, rather I tried two different methods, including the one I posted using the shell script.

Additionally, the docker documentation seems to indicate that I want to do ‘CMD [“ABC”]’ to get the program to spawn without the shell. Which is what I would really like to see happen. at this point.

I will try this again with the EXEC.

One more thing I would like to add:

This software worked as I describe it should when we run it on a generic Raspberry pi.

I only have this problem running on balena with the containers.

To note: I wrote a C program that is programmed to receive the SIGTERM and print to standard out. When it receives the signal.

I can share that when I get into work in a bit.

However, my application is not updating at all now… hence I can not get the current image loaded and running at all.

Hey there!

However, my application is not updating at all now… hence I can not get the current image loaded and running at all.

Are you still having this problem? You can enable support access and let us know the device UUID and we can help debug what’s going on!

That would be awesome! I have set my device in Admin support mode and will DM you with the uuid…

Thanks!

Hey there! We investigated the device and it looks like the reason it wasn’t updating was that the balena supervisor was not running for some reason. Before I proceed to investigate further, can you confirm that I can upgrade the supervisor from 10.2.2 to 10.7.0, just so I debug on the latest version, and any potentially relevant bug fixes are there?

Thanks for looking into this! Go ahead and perform whatever changes you need. This rig is strictly for testing and this (and the SIGTERM) that lead to this are our most important issues right now…

thanks

By the way, this is pretty scary for us, it would be bad for this to happen to us in on production systems…

We are deep trying to figure out what’s going on, and I totally relate to your concerns! From the looks of it, the device is consistently rebooting every couple of minutes. Could it be that its source of power is not stable and device is caught on a reboot loop that is preventing it from doing much?

I do not think so. I the RPI is being powered by the same source it always has…

I do believe the updates stopped after I introduced the container ‘sig’. That container was added simply to test containers signal handling.

I had a silly bug in my first version, which is the version that is still loaded on the device. I have no idea how that would have effected the hostOS, but the problem I am seeing correlates with the addition of that container.

Of course this could be totally coincidence. I would try to supply more power to the RPI, but I don’t have anything that would provide more power. I believe my the currently power is 5v @2amp.

Having said, that, I have seen power to be the source of some really weird problems, so I never rule it out …

We are still investigating, but the problem seems clear: I can very reliably see the device hard rebooting every ~5 minutes (usually less) for no clear reason. Let us know if you start supplying more power to it, just so we can see if we see any difference.