When using DT Overlay gpio-shutdown to do gpio pin based shutdown - containers don't receive SIGTERM

I have a multi container project that has a physical power-off button that takes advantage of the
dt-overlay = gpio-shutdown allowing the default GPIO3 pin to signal power-on / power-off to the raspberrypi.

This configuration was working well for some time it would shut-off the raspberrypi and send the proper SIGTERM to the containers.

However I recently refreshed my projects with newer base images migrating from node 10 to node 12. (I’m unsure if this is the cause) or a change to balenaOS.

My containers are not receiving a SIGTERM when the shutdown is initiated. However if I shutdown or reboot via supervisor a proper SIGTERM is being sent. It is only when the GPIO initiated shutdown is used I do not receive the proper SIGTERM.

The raspberrypi however does shutdown and turn on with GPIO3.

Any help would be appreciated.

Hi there, a couple of questions:

  • have you recently changed the balenaOS version? If so, from what version to what version?
  • How are you detecting that the containers aren’t receiving the SIGTERM?
  • Could you provide a Dockerfile.template + docker-compose

Hopefully this extra info will give us some clues as to what’s going on, thanks!

  • have you recently changed the balenaOS version? If so, from what version to what version?

balenaOS 2.38.0+rev1 , no change in OS

  • How are you detecting that the containers aren’t receiving the SIGTERM?

event in node , this code correctly catches the SIGTERM when shutdown or restart is initiated via the supervisor.

process.on('SIGTERM', async () => {
  • Could you provide a Dockerfile.template + docker-compose

The base container is the one that looks for SIGTERM

docker-compose.yaml

version: '2.1'

volumes:
  client-data:
    # external: true

services:
  base:
    build: ./base
    image: cloudcue-client-base
    container_name: cloudcue-base
    restart: always
    privileged: true
    volumes:
      - 'client-data:/var/lib/cloudcue'
    expose:
      - '80'
    ports:
      - '8088:80'
    labels:
      # io.balena.features.dbus: '1'
      io.balena.features.supervisor-api: '1'

  ui:
    build: ./base-ui
    image: cloudcue-client-base-ui
    container_name: cloudcue-ui
    restart: always
    environment:
      - API_HOST=base
      - API_PORT=80
    depends_on:
      - base
    expose:
      - '80'
    ports:
      - '8080:80'

  wpe:
    build: ./tools/wpe
    image: cloudcue-client-wpe
    container_name: cloudcue-wpe
    restart: always
    privileged: true
    environment:
      - WPE_URL=http://ui:80
    depends_on:
      - ui

dockerfile

# # *** Builder Container *** ---------------------------------------------------

FROM balenalib/raspberrypi3-alpine-node:12-build as build

# FROM node:10.15-alpine as build

# RUN apk --no-cache add --virtual native-deps \

# make g++ gcc python linux-headers udev libgcc libstdc++ wxgtk wxgtk-dev

WORKDIR /usr/src/

ADD ./BOSSA-1.7.0.tar.gz .

RUN make -C BOSSA-1.7.0 bin/bossac && cp BOSSA-1.7.0/bin/* /usr/local/bin/

WORKDIR /usr/src/app

COPY package.json ./

RUN npm set progress=false && npm config set depth 0

# install npm production dependencies

RUN npm install --only=production && npm cache verify

# copy production node_modules aside

RUN cp -R node_modules prod_node_modules

# install npm development dependencies

# making sure to clean up the artifacts it creates in order to reduce the image size.

RUN npm install --development && npm cache verify && rm -rf /tmp/*

# build app for production

COPY . ./

ENV NODE_ENV=production

RUN npm run build

# *** Production Container *** ------------------------------------------------

# FROM node:10.15-alpine

# FROM balenalib/%%BALENA_MACHINE_NAME%%-alpine

FROM balenalib/raspberrypi3-alpine-node:12-run as release

RUN apk --no-cache add alsa-lib

WORKDIR /usr/app

COPY package.json ./

# copy pre-compiled production node_modules

COPY --from=build /usr/src/app/prod_node_modules ./node_modules

# COPY --from=build /usr/src/app/node_modules/epoll node_modules/epoll

# COPY --from=build /usr/src/app/node_modules/@serialport node_modules/@serialport

COPY --from=build /usr/src/app/config config

COPY --from=build /usr/src/app/dist/src dist/src

COPY --from=build /usr/src/app/firmware firmware

COPY --from=build /usr/local/bin/bossac firmware/_arm/bossac

RUN chmod -R 755 /usr/app/firmware/_arm

# COPY udev_pause.sh .

# RUN chmod 755 udev_pause.sh

# COPY udev.rules /etc/udev/rules.d/udev.rules

# setup environment

ENV UDEV=1

ENV NODE_ENV=production

EXPOSE 80

CMD ["node", "dist/src/index.js"]

# CMD npm start

Hi, can you try using the latest OS available? We did some changes related to making this work with the gpio-shutdown overlay recently. What hardware are you using?

I believe this problem was fixed with https://github.com/balena-os/meta-balena/pull/1847 which got deployed in OS release 2.48.0 or newer.

I will try latest OS, thanks

I upgraded my Dev and Pre-Production to 2.51.1+rev1.

Seems like the same issue is there. Restart and Reboot from supervisor works as expected sending SIGTERM. Shutdown via GPIO3 triggers nothing in containers before the raspberrypi 3 powers off.

How are you configuring the overlay?

Fleet Wide DT overlay
“pi3-miniuart-bt”,“gpio-shutdown”

Hi, you also need to configure the overlay. See this forum post for example: Adding overlays to the host OS

Please make sure you change to the gpios you are using.

The overlay is configured correctly and has been for 2 years now. As I previously indicated in the original issue the raspberrypi has no problem shutting down via the default “gpio-shutdown” overlay and external button. It powers off and on as expected. The issue is the behavior is not the same as supervisor “shutdown”. Everything happens properly during a supervisor “shutdown” my app get the signal and gracefully shuts down all of it’s services then terminates with a process.exit().

When shutting down via the gpio-shutdown the supervisor is not going through the same sequence causing our app to get blindly killed. It also seems like the terminal logging is halted as soon as the gpio-shutdown commences. Which makes things even harder to debug. where as when the supervisor shutdown happens we see all the logs in the terminal.

So what triggered this bad behavior? Just you switching the base images?

The only thing I can think of is the upgrade of containers
we used to use
balenalib/raspberrypi3-alpine-node:10.16.3

then recently moved to 12

balenalib/raspberrypi3-alpine-node:12-run

most everything was the same but sitting for 4 months idle , when we came back online we upgraded to node 12 and the latest balena-cli.

then today to the latest OS

I can try going back to node 10 but it’s odd that would have anything to do with it unless the base image was refactored, actually as I remember when we came back online a week ago the builds said that our docker tag was invalid which kinda prompted the upgrade. so maybe the containers are build a little differently now ???

I would be interested if you could switch back to the older 10.16.3 base image and check if the issue is still there. Can you give that a go?

I think I figured it out. The upgrade to 2.51.1+rev1. did work, but because of the lack of logging during shutdown and a new issue that I just found it appeared like it didn’t work.

So two things I found with the new OS.

  1. Logging to the portal STOPS as soon as the GPIO shutdown is initialized. Makes it real hard to determine what is going on.
  2. In my code I had to remove an supervisor API call during the shutdown. it seems that in this new version API is unavailable or acting irregularly during GPIO shutdown, where it is available during a supervisor shutdown.

I commented out the supervisor API call and it seems everything is shutting down now. (from what I can tell via my hardware OLED screen) since I get no logs back to the Balena web portal.

This was the api call that I bypassed during shutdown

const url = `${process.env.BALENA_SUPERVISOR_ADDRESS}/v1/device?apikey=${process.env.BALENA_SUPERVISOR_API_KEY}`;

Can we open a ticket on the console logging, as it might throw others for a loop

Can you also try experimenting with the stop grace period as shown here https://www.balena.io/docs/reference/base-images/base-images/#how-the-images-work-at-runtime and increase it?

I increased the stop grace period and no luck on the logging.

Typically I would see the logging below during a SIGTERM
22.06.20 19:04:59 (-0400) base Starting graceful shutdown
22.06.20 19:04:59 (-0400) base
22.06.20 19:04:59 (-0400) base Stopping Processors
22.06.20 19:04:59 (-0400) base • NODE_USB - Node Usb Processor
22.06.20 19:04:59 (-0400) base • STATUS - Status Message Processor
22.06.20 19:04:59 (-0400) base • IO-MIDI - IO MIDI Processor
22.06.20 19:04:59 (-0400) base Stopping Service Providers
22.06.20 19:04:59 (-0400) base • CUE - CueLight
22.06.20 19:04:59 (-0400) base
22.06.20 19:04:59 (-0400) base * Simultaneous Message Send (waiting 557ms)
22.06.20 19:04:59 (-0400) base
22.06.20 19:04:59 (-0400) base ~ Control Handler Detached: 2 - BT-TEST
22.06.20 19:04:59 (-0400) base • RFM69 - CloudCue Network
22.06.20 19:04:59 (-0400) base Save Database
22.06.20 19:04:59 (-0400) base Websocket Close Code=1000 Reason=
22.06.20 19:05:00 (-0400) base Waiting 1s for Db Save
22.06.20 19:05:00 (-0400) base -Shutdown-

When doing a GPIO shutdown there is zero logging on any containers including the Host OS logs once the power button is pushed. The SIGTERM is being sent, and our shutdown code is running just no console logging

Hi, I have talked with the supervisor team and you may have indeed identified a regression with how logging is flushed during a GPIO shutdown. Are you able to provide a minimal reproduction of the issue so that we can test it locally? It could be in the form of a bare-bones app that demonstrates the issue or a description how to easily reproduce.

Reproduction is simple.

  1. Apply DT overlay = “pi3-miniuart-bt”,“gpio-shutdown”
  2. Deploy Test project
    https://drive.google.com/file/d/1HWNw1Hpd7OnRXT-TVd1rHzccRrlOJOCT/view?usp=sharing
  3. issue shutdown or restart via Balena Portal (you will see correct logging of app shutting down on SIGTERM
  4. issue shutdown via GPIO (ground GPIO3) will start pi shutdown (no logging will be sent)

Thanks for that, I will forward it to the supervisor maintainer.