I have a multi container project that has a physical power-off button that takes advantage of the
dt-overlay = gpio-shutdown allowing the default GPIO3 pin to signal power-on / power-off to the raspberrypi.
This configuration was working well for some time it would shut-off the raspberrypi and send the proper SIGTERM to the containers.
However I recently refreshed my projects with newer base images migrating from node 10 to node 12. (I’m unsure if this is the cause) or a change to balenaOS.
My containers are not receiving a SIGTERM when the shutdown is initiated. However if I shutdown or reboot via supervisor a proper SIGTERM is being sent. It is only when the GPIO initiated shutdown is used I do not receive the proper SIGTERM.
The raspberrypi however does shutdown and turn on with GPIO3.
Hi, can you try using the latest OS available? We did some changes related to making this work with the gpio-shutdown overlay recently. What hardware are you using?
I upgraded my Dev and Pre-Production to 2.51.1+rev1.
Seems like the same issue is there. Restart and Reboot from supervisor works as expected sending SIGTERM. Shutdown via GPIO3 triggers nothing in containers before the raspberrypi 3 powers off.
The overlay is configured correctly and has been for 2 years now. As I previously indicated in the original issue the raspberrypi has no problem shutting down via the default “gpio-shutdown” overlay and external button. It powers off and on as expected. The issue is the behavior is not the same as supervisor “shutdown”. Everything happens properly during a supervisor “shutdown” my app get the signal and gracefully shuts down all of it’s services then terminates with a process.exit().
When shutting down via the gpio-shutdown the supervisor is not going through the same sequence causing our app to get blindly killed. It also seems like the terminal logging is halted as soon as the gpio-shutdown commences. Which makes things even harder to debug. where as when the supervisor shutdown happens we see all the logs in the terminal.
The only thing I can think of is the upgrade of containers
we used to use balenalib/raspberrypi3-alpine-node:10.16.3
then recently moved to 12
balenalib/raspberrypi3-alpine-node:12-run
most everything was the same but sitting for 4 months idle , when we came back online we upgraded to node 12 and the latest balena-cli.
then today to the latest OS
I can try going back to node 10 but it’s odd that would have anything to do with it unless the base image was refactored, actually as I remember when we came back online a week ago the builds said that our docker tag was invalid which kinda prompted the upgrade. so maybe the containers are build a little differently now ???
I think I figured it out. The upgrade to 2.51.1+rev1. did work, but because of the lack of logging during shutdown and a new issue that I just found it appeared like it didn’t work.
So two things I found with the new OS.
Logging to the portal STOPS as soon as the GPIO shutdown is initialized. Makes it real hard to determine what is going on.
In my code I had to remove an supervisor API call during the shutdown. it seems that in this new version API is unavailable or acting irregularly during GPIO shutdown, where it is available during a supervisor shutdown.
I commented out the supervisor API call and it seems everything is shutting down now. (from what I can tell via my hardware OLED screen) since I get no logs back to the Balena web portal.
This was the api call that I bypassed during shutdown
I increased the stop grace period and no luck on the logging.
Typically I would see the logging below during a SIGTERM
22.06.20 19:04:59 (-0400) base Starting graceful shutdown
22.06.20 19:04:59 (-0400) base
22.06.20 19:04:59 (-0400) base Stopping Processors
22.06.20 19:04:59 (-0400) base • NODE_USB - Node Usb Processor
22.06.20 19:04:59 (-0400) base • STATUS - Status Message Processor
22.06.20 19:04:59 (-0400) base • IO-MIDI - IO MIDI Processor
22.06.20 19:04:59 (-0400) base Stopping Service Providers
22.06.20 19:04:59 (-0400) base • CUE - CueLight
22.06.20 19:04:59 (-0400) base
22.06.20 19:04:59 (-0400) base * Simultaneous Message Send (waiting 557ms)
22.06.20 19:04:59 (-0400) base
22.06.20 19:04:59 (-0400) base ~ Control Handler Detached: 2 - BT-TEST
22.06.20 19:04:59 (-0400) base • RFM69 - CloudCue Network
22.06.20 19:04:59 (-0400) base Save Database
22.06.20 19:04:59 (-0400) base Websocket Close Code=1000 Reason=
22.06.20 19:05:00 (-0400) base Waiting 1s for Db Save
22.06.20 19:05:00 (-0400) base -Shutdown-
When doing a GPIO shutdown there is zero logging on any containers including the Host OS logs once the power button is pushed. The SIGTERM is being sent, and our shutdown code is running just no console logging
Hi, I have talked with the supervisor team and you may have indeed identified a regression with how logging is flushed during a GPIO shutdown. Are you able to provide a minimal reproduction of the issue so that we can test it locally? It could be in the form of a bare-bones app that demonstrates the issue or a description how to easily reproduce.