Balena multiple services and docker "layer deduplication"?

Hello,
I do not feel comfortable with advanced docker stuff, but i was wondering something : can balena achieve some optimization when docker-composed services share some layers?

As an explanantion, see example below:
simplified docker-compose :

  talker0:
    image: ros:noetic-ros-core-focal
    command: stdbuf -o L rostopic pub /chatter std_msgs/String "hello" -r 1

  listener0:
    image: ros:noetic-ros-core-focal
    command: stdbuf -o L rostopic echo /chatter
(...)
  talkerN:
    image: ros:noetic-ros-core-focal
    command: stdbuf -o L rostopic pub /chatter std_msgs/String "hello" -r 1

  listenerN:
    image: ros:noetic-ros-core-focal
    command: stdbuf -o L rostopic echo /chatter

leads to following logs :

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Service          β”‚ Image Size  β”‚ Build Time β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ talker           β”‚ 738.35 MB   β”‚ < 1 second β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ ... *N           β”‚ 738.35 MB   β”‚ < 1 second β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ listener         β”‚ 738.35 MB   β”‚ < 1 second β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Empirical tests make me conclude that I won’t use N* 738Mb of diskspace.
Can someone point me out the name of that mechanism please, and potentially condition I need to met to allow such diskspace savings !?

Hi there @luchko. By the nature of docker, when pulling images, assuming you are not squashing the images, the engine will pull the shared layers once, and only pull the different layers when they exist. For instance if you have 3 different services building from a node parent image, all these services will share the node layers and the extra data downloaded will be the distinct files in these services.

In the example you provided you are correct that since there is only one image, the engine will only pull the ros:noetic-ros-core-focal image once and just configure the services with the distinct commands.

You can also achieve the same when building from a local dockerfile by using the build and image properties together. For instance

service1:
   build: ./my-service
   image: my-service
   command: ./my-command hello
service2:
   build: ./my-service
   image: my-service
   command: ./my-command goodbye

In this case both services will share the same image which means that data will only be downloaded once.

Please let us know if this answers your question or if you need some more specific examples.

Thanks, that answers my question perfectly.

Would it be possible to add a column with sthg like balena system df -v at the end of balena push ?
That would allow users to see β€œreal diskspace usage” instead of β€œimage size”.
Thanks!

Glad to help. Regarding your suggestion I can certainly make a note of it, but I’m not entirely sure how feasible it is, since the builder reports on what’s been uploaded to the registry, but on the registry these images share disk space with other app images and there is no strict separation on what belongs to a single release.

If you want more control you can always do a balena build followed by a balena deploy and query the engine state in between to get an idea on how much disk space will be used.

1 Like