Volumes_From required for PhP apps

HI all,

I am deploying an app to BalenaOS which operates in an NGINX container with PHP linked through a docker compose file. Both NGINX and PHP require reading the same files in order to operate. So there are some options:

  1. The best practice approach, create a docker volume that both containers can view, problem solved. But as it is a volume the data gets left behind on the host, so that when I deploy an app update, any deleted content from the new app will still be present on the device, sitting in the volume.
  2. Don’t link PHP and NGINX containers, and then have a copy of the same data in both. But involves more space being take, and more bandwidth to retrieve the images.
  3. Some sort of data container, that empties the existing volume, then moves the new data to that volume. Is a little messy, and also entails having the data duplicated, taking up twice the space. A mv might reduce the space issue, but this is a really messy solution.
  4. Rename the volumes on each app update, and enable the balena feature that disposes of unlinked volumes. But again carries risks, the disabling the disposal of unliked volumes feature was presumably disabled by default for a reason.
  5. Create a docker file that installs PHP and NGINX in one container. Generally bad docker practice.
  6. The approach used before in Docker (at least compose v2 before it was removed in v3), is to use the volumes_from feature. It means the volume can be created in the DockerFile, which by design is removed when the built container is removed. Then the docker compose file mounts the created temporary volume by using volumes_from which doesn’t require a name to be specified, or the volumes to be defined in the compose file.

I see the volumes_from feature is currently not available in Balena (https://www.balena.io/docs/reference/supervisor/docker-compose/). I would like to add a +1 request for this feature, but also to ask if anyone has found a way around this in the meantime. In short, if anyone has found a way to share non-persistent content between containers.

Hi,

Just to clarify, how large are those files? Are they updated together with a single container, or the whole application?

Not entirely sure I understand the question. The files individually are small, 1mb Max, but there can be many of them. It’s serving a web based system that can be accessed via phones.

They are copied into an official NGINX image via a dockerfile.

I’ve pinged the engine’s maintainer internally, but looks like you mapped all possibilities already. Will update you one I have more info.

Hi. Unfortunately, looks like there isn’t much we can add apart from what you have already discovered.

Thanks for looking into it for me. It is an issue I have seen people have experienced with Docker more broadly outside of Balena, but have overcome with the volumes_from in v2 compose. Thought it was worth finding out if I had missed something. I will post back if I come up with anything else, but for now will just leave it as a request for volumes_from in order to have non-persistent content between containers.

Thank you for the suggestion. I will pass that up to the maintainer.

For anyone interested or encountering a similar issue, here is what I have settled on for now. An independent container that carries the data I want in my volume inside the image, then updates the volume to match this new data on start.

Dockerfile:

#using standard Alpine image as this is a simple process and going for small file size. 
#Don't forget, if you are already using other images in other containers, they are a better base even if they are bigger than this one as it will only download the image once.     
   
FROM alpine:3.10

#Install rsync

RUN apk add --no-cache rsync

#set work directory
WORKDIR /data

#copy and chmod the script provided next in to the image
COPY scripts scripts

RUN chmod +x scripts/*

#copy the content you want in your volume

COPY content/ /data/imgdata/

#execute scripts on start
CMD ["sh", "/data/scripts/start.sh"]


start.sh:

#!/usr/bin/env bash

#Set a file as an indicator of whether the process has run already
FILE=/data/done

#check if the file exists yet

    if [ ! -f "$FILE" ]; then

#if the file doesn't exist, use rsync to align the volume with the content of /data/imgdata. 
#--delete removes anything in the volume that isn't in the image.
#--no-whole-file forces rsync to use its delta update feature to reduce SD card writes. 
#--inplace overrides rsync's default behaviour which is to copy the file across first, and then delete the old one, which would mean more writing to the card (it seems both these last to flags are required for delta to work)

  rsync -a --delete --no-whole-file --inplace /data/imgdata/ /data/voldata

#This creates a file to indicate that the process completed. 
#If the power is pulled before the process completes, this file won't exist, and it will know to run again. 
#This file is created in the non-persistent image layers, so will be removed when you next deploy an updated image, therefore triggering the volume check to ensure it is mirrored with your data. 
#It also prevents rsync from having to run an entire file check each time the container loads, which would be unnecessary resources. 

    touch $FILE

    fi

#loop to keep the container open. This allows for a --restart always policy to be used, to ensure the volumes are updated and to prevent issues on power failure or write. 

    while :; do sleep 2073600; done


Drop it into compose

version: '2.1'

services:
    data:
        build: ./data
        restart: always
        volumes:
            - 'data:/data/voldata'

volumes:
    data:

There was in my initial post a concern about duplicating space used. My understanding of the layers (although it needs some more thinking through and testing) is that the data exists in the image, and when mounted to a volume it is copied there which accounts for the improved write efficiencies and the persistence after image rebuilds. In other words, when mounting a folder from an image you end up with two versions of the data. Through this method, using the rsync you still end up with 2 versions of the data, so not any worse off than before.

This is experimental, hasn’t been thoroughly tested, please do improve/add/revise/advise and use with caution.

If effective, this could be more useful than the volumes_from option, as each time the image is run in the volumes_from scenario, it will write a brand new volume and move the image content to it, creating more writes to disk.

That is really useful! Thanks a lot for sharing!