Here’s a hack that lets me mount and share external storage devices across containers without having to modify the BalenaOS image and without requiring the containers to run in privileged mode. Posting here both to share it with the community and to ask for feedback in case there are hidden gotchas I haven’t found.
Let’s say there are three containers involved. The first container is disk-manager
and it’s the thing that detects and mounts the storage device. The other two are service1
and service2
and they use the storage device.
We’ll assume the storage device shows up as /dev/sda1
when it’s connected.
disk-manager
The service definition in docker-compose.yml
will need some specific settings:
privileged: true
environment:
DBUS_SYSTEM_BUS_ADDRESS: ‘unix:path=/host/run/dbus/system_bus_socket’
labels:
io.balena.features.dbus: 1
Make sure you can send D-Bus commands from disk-manager
. There are D-Bus libraries for various languages but for purposes of this example, let’s assume we’re using a Debian or Ubuntu base image and we want to do everything from the command line. Run apt install systemd
to get the various systemd command-line tools.
The service should wait for the storage device to appear. Take your pick of techniques here: you could use udev (in which case add UDEV: 1
to the environment
stanza), or mount /dev
as a devtmpfs
filesystem and watch for devices, or explicitly expose the specific device in a devices
clause in the service configuration.
When the device is detected, do any necessary application-specific setup. You will want to make at least one subdirectory on the storage device for services to use. For purposes of the example, we’ll assume there is a single services
subdirectory that is shared by service1
and service2
. This will come into play later.
mount /dev/sda1 /mnt
mkdir /mnt/services
Once a device is attached and properly initialized, the hack begins.
The crux of it: We’re going to ask the host’s systemd to mount the storage device in a location that can be bind-mounted into containers. Balena doesn’t allow arbitrary bind mounts of host OS directories, but it does support mounting a few specific directories: devices, processes, and journald logfiles. Devices and processes are special directories we can’t modify, but the journald log directories are just ordinary directories.
We’ll be using a command called systemd-run
to ask the host OS to run commands for us. That command tries to detect whether systemd is running, and will fail by default when you run it in a container. But its detection is pretty easy to fool: just create a directory /run/systemd/system
.
Now we create the mount point and mount the storage device in the host OS.
systemd-run --wait mkdir /run/log/journal/demo-storage
systemd-run --wait mount /dev/sda1 /run/log/journal/demo-storage
service1 and service2
Run both of these with
labels:
io.balena.features.journal-logs: 1
In their startup logic, check for the existence of the subdirectory the disk manager created, which should appear underneath /run/log/journal/demo-storage
. DO NOT CREATE THE SUBDIRECTORY HERE, JUST CHECK IF IT EXISTS. Exit if it doesn’t exist. For example, in shell, using the services
subdirectory mentioned above:
if [ ! -d /run/log/journal/demo-storage/services ]; then
echo "Storage directory not found" 1>&2
# Sleep a bit to cut down on log spew
sleep 10
exit 1
fi
And you’re good to go. You can access /run/log/journal/demo-storage/services
from both containers and it will go to the external storage device.
Note that you need to make the services exit and get restarted if the data subdirectory doesn’t exist. If you just sleep in a loop waiting for the subdirectory to appear, it never will because the storage device wasn’t mounted on the host OS at the time the container was started.
But wait, won’t log rotation delete all my files?
The system automatically trims journald logfiles periodically, but it won’t touch your external storage for two reasons:
- The journal system requires that logs be in a subdirectory whose filename is a valid machine ID. A machine ID is a fixed-length hexadecimal value. The name of the mount point (
demo-storage
in the example) won’t be detected as a log directory. - Even if, for some reason, you name the mount point with a hexadecimal filename of the right length, journald’s log trimming code doesn’t recursively walk directory trees. It only considers journal files directly underneath
/run/log/journal/<machine-id>
. So it will never visit any files in the deeper subdirectory where the services are actually writing their data.
Refinements
The walkthrough above was stripped down to make it easier to follow. In a production setup, you might consider cleaning it up a tad, for example:
- Don’t hardwire
/run/log/journal
in the services. Instead, pass in an environment variable to tell the services where to store their data. This is good practice in general, especially if you’re going to be running the containers outside of BalenaOS for testing. In that case you can run them with a regular Docker bind mount and point them to the mount location. - Have the disk manager use the Balena supervisor API to explicitly start the other services when the device is available, so they don’t have to keep failing and getting restarted. This also gives you a little more control over what happens if the user removes the storage device.
- If you’re using the
systemd-run
command rather than a D-Bus client library, don’t install the entirety of systemd into your container, just the bits you actually need.
Bonus: alternate approach
What I was doing before I figured out this hack was to run each container in privileged mode with UDEV=1
. Each service’s start script would look for the storage device in /dev
and mount it into the container’s filesystem before dropping privileges and launching the actual service.
This worked okay, but it means the containers need to include this device-mounting code and thus that they work differently on BalenaOS than they do in other environments where you’d use a normal Docker bind mount. And of course it also means that everything needs elevated privileges which is not great security practice.