Postgres Database Persistent Storage on NVMe SSD instead of SD Card

I am utterly lost on how to use my NVMe SSD for my Postgres data, i.e. /var/lib/postgres/data, in my single-container application on my Jetson Xavier NX SD Card devkit. I have BalenaOS and the containers running from the SD Card, but I would like the database to be persisted and stored on the NVMe. I have the following Dockerfile:

# FROM balenalib/jetson-xavier-nx-devkit-ubuntu
# FROM balenalib/jetson-xavier-nx-devkit-alpine
FROM timescale/timescaledb:latest-pg13

ENV POSTGRES_PASSWORD "password123"
ENV TIMESCALEDB_TELEMETRY "off"

This successfully builds an image with Postgres/Timescale, but the data will be stored at /var/lib/postgres/data on the SD card, or eMMC depending on the Jetson Xavier NX devkit and device provisioning. I SSH-ed into the running container, and I manually mounted the NVMe SSD, where I have formatted the NVMe with the ext4 format and given it the nvme label. At least the NVMe is recognized by the host and I can manually access it from a container.

$ balena push <UUID>.local
$ balena ssh <UUID>.local main
bash-5.1# blkid | grep 'LABEL="nvme"'
/dev/nvme0n1p1: LABEL="nvme" UUID="ed03fc8a-ce3f-4a2c-a939-8c12b520e271" TYPE="ext4"
bash-5.1# mkdir -p /mnt/nvme
bash-5.1# mount -t ext4 -o rw /dev/nvme0n1p1 /mnt/nvme
bash-5.1# ls /mnt/nvme
lost+found

The TimescaleDB source image contains an Alpine distribution, and I have done the above with the balenalib/jetson-xavier-nx-devkit-alpine and balenalib/jetson-xavier-nx-devkit-ubuntu base images, too. I recognize that the Balena base images do some additional work for udev and device/hardware access, but regardless of the base/parent image used, I am still having problems.

Side note, I was initially confused by the Mounting external storage media documentation because it is for the non-Alpine base images despite the example project using an Alpine base image. There is no -L option for the mount command in the Alpine base images. Thus, the blkid | grep 'LABEL="nvme"' command is used to get the device name instead of the recommended label-based mounting command.

In my naive understanding, I converted the commands from the SSH session to manually mount the drive to the Dockerfile:

# DO NOT FOLLOW THIS DOCKERFILE. IT WILL NOT WORK.
# FROM balenalib/jetson-xavier-nx-devkit-ubuntu
# FROM balenalib/jetson-xavier-nx-devkit-alpine
FROM timescale/timescaledb:latest-pg13

RUN mkdir -p /mnt/nvme
RUN mount -t ext4 -o rw /dev/nvme0n1p1 /mnt/nvme
RUN mkdir -p /mnt/nvme/postgres

ENV PGDATA "/mnt/nvme/postgres/data"
ENV POSTGRES_PASSWORD "password123"
ENV TIMESCALEDB_TELEMETRY "off"

This does not work, regardless of the base image used. I get a mount: permission denied (are you root?) error. I think the error is caused by the device, /dev/nvme0n1p1, not being available during the build. The Dockerfile is a recipe for building an image, not a “startup” script.

So then, there must be some way to mount an “external” drive, but just once when the container is started, not dynamically while the container is running. I don’t think I need udev for this, as the documentation seems to indicate that udev is only needed for dynamically mounting and unmounting USB drives and/or additional SD cards. Plus, the dynamic udev detection would be started after the container has started. I think I need the NVMe to be mounted before Postgres is started. Taking inspiration from the example storage project, I created the following script:

#!/env/bin bash
mkdir -p /mnt/nvme
mount -t ext4 -o rw /dev/nvme0n1p1 /mnt/nvme
mkdir -p /mnt/nvme/postgres

I could not figure out how to run this script at boot of the container without overriding the entry point for the Postgres/Timescale parent image. After a lot of reading, experimenting, and cursing, I thought I found a solution by using a custom fstab file within the container that would automatically mount the NVMe to the /mnt/nvme mount point on boot without having to execute the above script. I used the following fstab file and modified Dockerfile:

LABEL=nvme      /mnt/nvme       ext4    defaults  0 0
# DO NOT FOLLOW THIS DOCKERFILE. IT WILL NOT WORK.
# FROM balenalib/jetson-xavier-nx-devkit-ubuntu
# FROM balenalib/jetson-xavier-nx-devkit-alpine
FROM timescale/timescaledb:latest-pg13

RUN mkdir -p /mnt/nvme
COPY ./fstab /etc/fstab

ENV PGDATA "/mnt/nvme/postgres/data"
ENV POSTGRES_PASSWORD "password123"
ENV TIMESCALEDB_TELEMETRY "off"

This also did not work. If I SSH log into the host and mount the NVMe manually to /tmp/nvme, then the contents of /tmp/nvme is just a “lost+found” folder with no postgres/data folder. However, if I SSH log into the container, the /mnt/nvme/postgres/data location does exist, and it contains Postgres-related files and folders, but if I rebuild and restart the container, then any data within the database is lost. This might not be working because the fstab file is not actually mounting the NVMe but mounting the folders like a symbolic link still on the SD card.

I tried playing around with a docker-compose.yml file and named volumes, as this appears to be the appropriate way to persist data. All of the StackOverflow Q&As and Docker-related documentation I can find through various Internet searches suggest mounting the device, mount -L nvme /mnt/nvme, first, then using volumes within a docker-compose file. I cannot figure out how to have the NVMe mounted automatically by BalenaOS and/or executing the NVMe mount by the host. I tried various versions of the following docker-compose file to no resolution:

# DOES NOT WORK. DO NOT USE.
version: "2"
services:
  database:
    build: .
    volumes:
      - pgdata:/var/lib/postgres/data
volumes:
  pgdata:
    driver: local
    driver_opts:
      device: /dev/nvme0n1p1

I recognize that named volumes are stored in /var/lib/docker/volumes. The named volume would be on the SD Card or eMMC, but I thought there would be some way to have this specific volume on the NVMe. I found some StackOverflow Q&As that indicated a symbolic link could be created, but this is assuming CLI access and running containers with docker on the host system, not with BalenaOS. This also feels like a hack for some reason.

I feel like this should be relatively straight-forward, but I am new to all of this (Docker, Balena, and containers) and there must be something I am missing. I am hoping the Balena community can help me out.

Hey @ts-cfield, I know it can be tricky to get storage mounted in a container, but you are definitely on the right track!

I could not figure out how to run this script at boot of the container without overriding the entry point for the Postgres/Timescale parent image.

The best way in my opinion is that you DO want to override the existing entrypoint with a small script that does some stuff then calls exec docker-entrypoint.sh.

To find out the entrypoint of the parent image you can do something like this.

docker pull timescale/timescaledb:latest-pg13'
docker inspect timescale/timescaledb:latest-pg13 --format '{{.Config.Entrypoint}}'

A good example of this in practice is my Nextcloud project where I mount any partitions under /dev/sd?? before calling the original entrypoint command.
https://github.com/klutchell/balena-nextcloud/blob/main/app/custom-init.sh

I would not recommend proceeding with fstab or named volumes in order to write the data to an external disk, the mount script is the way to go in this case.

I hope this helps, let us know how it works out!

@klutchell, Thank you for the information and guidance. I feel I have learned a massive amount about all of this since I originally posted, but your post has been very helpful.

I have the following (untested) init.sh script to mount the NVMe drive at boot of the container/device:

#!/env/bin bash

# UNTESTED!

mkdir -p /mnt/nvme
mount -t ext4 -o rw -L nvme /mnt/nvme
mkdir -p /mnt/nvme/postgres/data

exec docker-entrypoint.sh

with the following (untested) Dockerfile:

# UNTESTED!
FROM timescale/timescaledb:latest-pg13

ENV PGDATA=/mnt/nvme/postgres/data
ENV POSTGRES_PASSWORD=password123
ENV TIMESCALEDB_TELEMETRY=off

COPY init.sh /

RUN chmod +x /init.sh

ENTRYPOINT ["/init.sh"]

While I haven’t had a chance to test this yet, I believe I am on the right track based on the example in @klutchell’s NextCloud project. I will report back once I get a chance to test it.

I would like to note that this means only this container has access to the NVMe. Access from multiple containers is currently not supported, but based on the discussion in How to add persistent storage for Balena OS on Raspberry PI, it might be possible to partition the NVMe into separate partitions and mount each partition separately within each container. Once I get a single container working, I am going to try this as well.

Just in case someone else comes across this post. Some additional thoughts and details:

  1. I would like to clarify that Named Volumes are not needed for persistent storage when using an external drive. If the container is restarted or the device is restarted, the data will still be on the external drive and available to the container. Based on my reading of the documentation, both for Balena and Docker, this point is not very clear. External device storage is separate from Named Volumes.
  2. I had another idea to move the /var/lib/docker/volumes folder to the NVMe and then all Named Volumes would be stored on the larger, more stable (non-SD card) “external” drive. Moving the /var/lib/docker/volumes folder appears to be very involved and possibly harmful to the OS. Using symbolic links can also cause errors. While this would make persisting storage on an external drive relatively straight-forward, it is basically a non-starter.

Sorry, but this is not working. As soon as I add an ENTRYPOINT ["/custom-init.sh"] line to my Dockerfile, I get an infinite loop of the service restarting or a “Device state not settled, retrying in 1000 ms”, no matter the contents of the custom-init.sh file. I stripped down the custom-init.sh script to the following, i.e. removed the mounting code for the time being:

#!/usr/bin/env bash
set -Eeo pipefail # Added this recently, appears to have no effect

# Started here
#exec docker-entrypoint.sh

# I thought the docker-entrypoint.sh script could not be found, which was causing the infinite loop 
# for some reason. This results in a "File not found" error.
#exec /docker-entrypoint.sh

# I read somewhere this was good to do:
#exec docker-entrypoint.sh "$@"

I also tried the Postgres base image instead of the TimescaleDB image, but the same problem occurs regardless of the Postgres or TimescaleDB base image. In other words, I have tried FROM postgres:13 and FROM timescale/timescaledb:latest-pg13 without success.

Current Dockerfile:

# FROM postgres:13
FROM timescale/timescaledb:latest-pg13

ENV POSTGRES_PASSWORD=password123

# ENV PGDATA=/mnt/nvme/postgres/data
ENV TIMESCALEDB_TELEMETRY=off

COPY custom-init.sh /

RUN chmod +x /custom-init.sh

ENTRYPOINT ["custom-init.sh"]

CMD ["postgres"]

I have tired with and without the CMD ["postgres"] line at the end without success as well. However, if I comment out and/or remove the ENTRYPOINT["custom-init.sh"] line, then the database service starts and runs as expected.

Not sure where to go from here. If I cannot run a custom entrypoint script, then I cannot mount the “external” NVMe drive.

I think may have this resolved. There are several points that I appear to have missed in my early attempts.

  1. The image must be privileged and have UDEV enabled. The privileged: true key-value must be included in the docker-compose.yml file if attempting to mount the external drive within a multi-container project and not using one of Balena’s base images.
  2. The PostgreSQL/TimescaleDB docker images use a non-root user, postgres. I was getting permission denied errors when attempting to mount the partition within the Postgres-based container. This can be solved within the entrypoint bash script with the su -c "mount -t ext4 -o rw /dev/nvme0n1p1 /mnt/nvme" root command.
  3. The last line in the Dockerfile must be CMD ["postgres"] not including this line will cause the infinite restarting of the service/container.

Here are the relevant portions of the docker-compose.yml file, the complete Dockerfile, and the entrypoint script that appear to work for mounting a partition of a NVMe SSD within a Postgres-based image:

# docker-compose.yml
...
  database:
    build: ./database
    restart: always
    privileged: true
    ports:
      - "5432:5432"
    environment:
      PGDATA: /mnt/nvme/postgresql/data
...
# Dockerfile
FROM timescale/timescaledb:latest-pg13

ENV TIMESCALEDB_TELEMETRY=off
ENV POSTGRES_PASSWORD=password123
ENV UDEV=1
ENV PGDATA=/mnt/nvme/postgresql/data

COPY entrypoint.sh /entrypoint.sh

RUN chmod +x /entrypoint.sh

ENTRYPOINT ["/entrypoint.sh"]

CMD ["postgres"]
#!/usr/bin/env bash

su - -c "mkdir -p /mnt/nvme" root
device=$(blkid | grep "LABEL=\"nvme\"" | cut -d : -f 1)
echo "Mounting device = ${device}"
su - -c "mount -t ext4 -o rw ${device} /mnt/nvme" root

exec docker-entrypoint.sh "$@"

The default PostgreSQL/TimescaleDB docker images are based on Alpine, so the blkid and mount commands work slightly differently. There is no -L for mount and no options at all for blkid. Thus, I added the blkid | grep | cut line in the bash script to find the device with the nvme label.

The docker-entrypoint.sh script is copied to /usr/local/bin as part of the parent image build, so it does not need to be copied into the child image nor does the mode need to be changed.

Hey @ts-cfield, I think you need to provide the absolute path to your custom init script unless unless you add it to the PATH.

COPY custom-init.sh /
RUN chmod +x /custom-init.sh
# use the absolute path here (or copy the file to /bin or /sbin, etc)
ENTRYPOINT ["/custom-init.sh"]

Then you can call the original entrypoint as it is written in the upstream image, in this case it appears to be in the PATH already.

#!/usr/bin/env bash

# must match the existing entrypoint of the original image
# in this case docker-entrypoint.sh should already be in PATH
exec docker-entrypoint.sh

Let us know how it goes!

@klutchell Thank you for help and advice. I used an absolute path for the entrypoint in the Dockerfile. The PostgreSQL/TimescaleDB base image copies their entrypoint script to /usr/local/bin, which makes it accessible from where, so the exec docker-entrypoint.sh "$@"line in the custom-init.sh file does not need the absolute path. In fact, it will fail if an absolute path to the PostgreSQL base image entrypoint script, docker-entrypoint.sh, is used.

I tested the following custom initialization, or entrypoint, script and it worked!

I did have to use an absolute path in the Dockerfile for the ENTRYPOINT ['/entrypoint.sh'] line and include the CMD ["postgres"] line. I chose to use an ENTRYPOINT to add some flexibility and future proof a little. Doing some additional research, the ENTRYPOINT/CMD combo is “best practices” or at least a common enough convention.

I was using the Alpine variant of the PostgreSQL/TimescaleDB docker image, so there are some difference in the bash script for mounting the NVMe drive. Also, the PostgreSQL/TimescaleDB image changes to the postgres user, so the bash/entrypoint script needs to execute some things as root instead of postgres.

Overall, I think I got this to work. Thank you!

1 Like

Glad it all worked out, and thank you for sharing your experiences and solutions here!