Is it possible to cache a "base image" for my project and use it during the build?

jason10 · May 24, 2019, 5:13am

I know that balena.io can deploy delta updates of docker images to my devices, but for at least one of my containers, the build time is very long and the resulting container is large. This requires a long time to compute the delta image.

I know that one option is to use local development and build on my local device. But is it possible to cache a “base image” for my project at the point where I know nothing is going to change, and use that as a starting point during the build?

shaunmulligan · May 24, 2019, 7:31am

Hi @jason10 not sure I follow exactly. For building the build system should use the previous build as the base for cache, so shouldn’t rebuild your service if nothing changed. Unfortunately I think it will still try run the delta against the two versions and we probably could improve that considerably. Is that the problem you are describing?

jason10 · May 24, 2019, 2:29pm

No, the problem is there is a big section of my dockerfile that installs CUDA, tensorflow, tomcat, Java, and other tools that don’t change and then the much shorter Node.js and .war file installation.

Perhaps I could break it into two containers, one that is the unchanging runtime, and second which contains the Node.js and .war file, and when launching copies the new codes to a shared volume…

Does that help? It’s what I am seeing is a full rebuild of all containers, but accelerated delta updates. And for a 4GB image, the acceleration is significant yet painfully long.

hedss · May 24, 2019, 2:45pm

Hi,

The builder should cache previous layers that haven’t changed during a build, so if you’ve a Dockerfile that has something like this:

FROM <base>

RUN <install CUDA, Java, etc..>

COPY <myApp> <appPath>

RUN <buildApp>

CMD ["startMyApp.sh"]

Then as long as none of the dependencies in the RUN <install CUDA, Java, etc..> step haven’t changed, that layer should be cached. If you’re not seeing that, it would be really useful to let us see the portions of the Dockerfile that aren’t working (but should be).

Splitting this into a multistage build might also help you, depending on the circumstances.

Best regards, Heds

jason10 · May 24, 2019, 3:22pm

The caching and delta updates definitely appear to be working, If I make a change to a different container then the delta download of the 4GB container is not required. However the build time is the same: too long.

Maybe if I combine some of RUN statements into a bash script the overall container image size will decrease.

I ran an experiment where I removed the CUDA and Java portions. The build time lowered and the container image size decreased.

What is the easiest way to share the dockerfile.template with you?

thundron · May 24, 2019, 3:36pm

Hi, you can share it in a github gist https://gist.github.com/ or attached as a file on this discussion, if the portion you can/want to share doesn’t contain sensible information

jason10 · May 25, 2019, 12:48am

Here you go

gist.github.com

https://gist.github.com/drjasonharrison-vp-eio/c98abcff11009e053618764686771942.html

Dockerfile.template

# https://github.com/eiodiagnostics/balena-jetson-tx2-experiments
FROM resin/jetson-tx2-debian:stretch
LABEL Description="This image is used create a privileged container for Nvidia Jetson TX2 using BalenaOS" 
LABEL Vendor="EIO Diagnostics" 
LABEL Version="1.0"

LABEL Author="jason@eiodiagnostics.com"

# retry curl after a delay, follow moved files
ARG CURL_OPTS="--max-time 10 --retry 5 --retry-delay 10 --location"

This file has been truncated. show original

jason10 · May 27, 2019, 3:10am

Just made an update to the dockerfile.template:

switched to balena base image with python 3.6.8
moved java and cuda installation to a bash file
added tensorflow installation
same URL as above

The build time is 25 minutes. Most of this I want to cache, only a small part of the repository changes compared to the installation of OpenJDK, CUDA, and tensorflow.

jviotti · May 27, 2019, 9:59am

Hey Jason,

The Balena Builder should cache layers that you don’t invalidate. If you don’t modify the commands at the top of the Dockerfile, then those should be cached out of the box. Is that not the case for you? If that’s a problem, then you might find ways to re-organize the Dockerfile so that steps that are unlikely to change get moved to the top, and eventually into base images of their own that extend the ones we provide.

jason10 · May 27, 2019, 6:09pm

Hey Juan,

I have reorganized my commands, putting the most likely to change stuff at the end. During development, when I’m not sure if I need to change the Balena base image, I of course put the risky stuff at the top so I have to wait as little as possible for it to fail.

What I’m seeing is a full build every time of all containers. Truthfully, I haven’t timed the pushes to balena but it certainly appears that every command is executed and the longer ones always take longer.

For example, consider:

github.com

balena-io-library/base-images/blob/master/balena-base-images/openjdk/jetson-tx2/debian/stretch/8-jdk/build/Dockerfile

# AUTOGENERATED FILE
FROM balenalib/jetson-tx2-debian:stretch-build

# A few reasons for installing distribution-provided OpenJDK:
#
#  1. Oracle.  Licensing prevents us from redistributing the official JDK.
#
#  2. Compiling OpenJDK also requires the JDK to be installed, and it gets
#     really hairy.
#
#     For some sample build times, see Debian's buildd logs:
#       https://buildd.debian.org/status/logs.php?pkg=openjdk-8

RUN apt-get update && apt-get install -y --no-install-recommends \
		bzip2 \
		unzip \
		xz-utils \
	&& rm -rf /var/lib/apt/lists/*

# Default to UTF-8 file.encoding

This file has been truncated. show original

I know that this is for building a balena base image, but imagine that I need JDK and Python 3.6.8 and CUDA in the same container. (I would love to refactor the container, but that’s not an option at the time).

If I copy and paste those JDK container commands into my mega-Dockerfile.template are you telling me that your builder should zip through the JDK building container commands the second time I do a git push to the balena remote, if and only if,

None of the commands before the JDK commands has changed
The previous builds were successful

Is that correct?

sradevski · May 27, 2019, 6:17pm

Hi Jason,

Yes, that is correct, each step/layer in your Dockerfile should be cache as long as nothing before it has changed.
If you want to test it, you can run balena push <appname> consecutively and you should observe using cache log entries in the build logs before each cached step on the follow-up builds.

jason10 · May 27, 2019, 6:19pm

How can that work if the command is like a CURL command fetching data? Only if the commands is exactly the same? Or only if the data fetched is exactly the same?

I don’t think I have ever seen using cache

sradevski · May 27, 2019, 6:55pm

If the command hasn’t changed, then it shouldn’t be run on subsequent builds unless a previous layer was invalidated. This Docker Best Practices should give you a good idea on what should and shouldn’t be done in order to utilize layer caching correctly. Can you run balena push twice and see if and up to which point caching will be used? As I said, you should see using cache with green background amongst the build steps.

jason10 · May 28, 2019, 7:08pm

Stevche, Juan, Lorenzo, Heds, Shaun, (@sradevski, @jviotti, @thundron, @hedss, @shaunmulligan)

Here are two logs from “git push balena …” where using cache appears in four of the five services but only partially for the the fifth service, which is requiring 22 minutes to build.

The only change was adding

#a comment

to the end of Server/Dockerfile.template – for the service barnserv

First we start with the log from git push balena:

gist.github.com

https://gist.github.com/drjasonharrison-vp-eio/f30ba8ff688ca84b0d680a4dac1701b6

3 time git push orbson_dev barnserv-mui-on-jetson-01.txt

Jasons-MacBook-Pro:camera-controller-resin harrison$ echo "#a comment" >> Server/Dockerfile.template 
Jasons-MacBook-Pro:camera-controller-resin harrison$ git add --all
Jasons-MacBook-Pro:camera-controller-resin harrison$ git commit -m "add a comment"
[barnserv-mui-on-jetson-01 31a8656] add a comment
 2 files changed, 417 insertions(+), 5 deletions(-)
 create mode 100644 1 time git push orbson_dev barnserv-mui-on-jetson-01.txt
Jasons-MacBook-Pro:camera-controller-resin harrison$ time git push orbson_dev barnserv-mui-on-jetson-01:master
Enumerating objects: 8, done.
Counting objects: 100% (8/8), done.
Delta compression using up to 4 threads

This file has been truncated. show original

Then we do the commands at the top of this gist, and follow with the log from git push balena:

gist.github.com

https://gist.github.com/drjasonharrison-vp-eio/c66798dde0d311f9790b770fe4602350

4 time git push orbson_dev barnserv-mui-on-jetson-01

$ echo "#a comment" >> Server/Dockerfile.template 
$ git add Server/Dockerfile.template 
$ git commit -m "add a comment"
$ git diff HEAD^ HEAD
diff --git a/Server/Dockerfile.template b/Server/Dockerfile.template
index 688a111..81f3b4d 100644
--- a/Server/Dockerfile.template
+++ b/Server/Dockerfile.template
@@ -178,3 +178,4 @@ WORKDIR /home/ubuntu/data/tomcat/
 CMD ["/usr/src/app/runCommand.bash"]

This file has been truncated. show original

If you search for [barnserv] Using cache you’ll find that the barnserv service used the cache up to step 11/44 then executed every step form there. Is this expected? Is this exceeding a limit in the docker or balena builder?

jason10 · May 28, 2019, 7:47pm

Ah, I think I understand. The 10th command is COPY . ./ which is going to copy the Dockerfile.template, which has changed, therefore all following commands cannot use the cache.

Here is another a discussion of a Dockerfile with the same problem.

samothx · May 28, 2019, 7:50pm

Hi Jason, that makes sense. Thanks for sharing the solution with us …

Topic		Replies	Views
setting --nocache or -c option still builds upon previous image Product support	2	144	February 6, 2024
balena-cli is not using the cache on fresh machine openBalena balena-cli	15	1099	February 14, 2020
Downloading cached images despite --nocache and delta updates disabled Product support	2	346	July 3, 2020
HELP: Caching not consistent (Delta Updates) Product support docker	5	430	November 29, 2019
Balena Base Images with balena-sdk Installed? Product support	10	249	December 21, 2021

Is it possible to cache a "base image" for my project and use it during the build?

Related topics