Preloading a container on Gitlab CICD using dind + TLS fails, unable to Error: unable to verify the first certificate

I’m trying to preload a container on Gitlab CICD using docker-in-docker / dind with TLS verification enabled. Meaning there will be a docker host running at tcp://docker:2376 and we have docker host and client certificates mounted at /certs/server/cert.pem and /certs/client/cert.pem.

My script looks as such:

wget -q https://github.com/balena-io/balena-cli/releases/download/v15.0.3/balena-cli-v15.0.3-linux-x64-standalone.zip && unzip -q balena-cli-v15.0.3-linux-x64-standalone.zip
balena login -t $BALENA_CLI_KEY
balena os download $BALENA_DEVICE --version $BALENA_VERSION --output $BALENA_DEVICE.img
balena preload $BALENA_DEVICE.img --fleet $BALENA_APP --commit current --dockerHost docker --dockerPort 2376 --debug --cert /certs/server/cert.pem 

Download works fine, but the preload fails with:

$ balena preload $BALENA_DEVICE.img --fleet $BALENA_APP --commit current --dockerHost docker --dockerPort 2376 --debug --cert $DOCKER_TLS_CERTDIR/server/cert.pem
[debug] new argv=[/balena/balena,/snapshot/balena-cli/bin/balena,preload,jetson-tx2-nx-devkit.img,--fleet,staging-linux-armv8-l4t-t186,--commit,current,--dockerHost,docker,--dockerPort,2376,--cert,/certs/server/cert.pem] length=14
[debug] Event tracking error: Timeout awaiting 'response' for 0ms
You must provide a CA, certificate and key in order to use TLS
ExpectedError: You must provide a CA, certificate and key in order to use TLS
    at generateConnectOpts (/snapshot/balena-cli/build/utils/docker.js:177:19)

If alternatively I omit the --cert flag I get:

$ balena preload $BALENA_DEVICE.img --fleet $BALENA_APP --commit current --dockerHost docker --dockerPort 2376 --debug
[debug] new argv=[/balena/balena,/snapshot/balena-cli/bin/balena,preload,jetson-tx2-nx-devkit.img,--fleet,staging-linux-armv8-l4t-t186,--commit,current,--dockerHost,docker,--dockerPort,2376] length=12
[debug] Event tracking error: Timeout awaiting 'response' for 0ms
Docker seems to be unavailable. Is it installed and running?
Error: unable to verify the first certificate
ExpectedError: Docker seems to be unavailable. Is it installed and running?
Error: unable to verify the first certificate
    at checkThatDockerIsReachable (/snapshot/balena-cli/build/utils/docker.js:197:15)
    at processTicksAndRejections (internal/process/task_queues.js:97:5)

I would love not to have to create a CA or disable TLS on dind just to preload containers. Any other ideas?

It turns out that even without doing the manual CA generation as lined above, docker in dind will create a CA itself:

$ ls -alh /certs/client
total 28K
drwxrwxrwx 2 root root 4.0K Jan 31 10:08 .
drwxr-xr-x 3 root root 4.0K Jan 31 10:09 ..
-rw-r--r-- 1 root root 1.8K Jan 31 10:08 ca.pem
-rw-r--r-- 1 root root 1.8K Jan 31 10:08 cert.pem
-rw-r--r-- 1 root root 1.6K Jan 31 10:08 csr.pem
-rw-r--r-- 1 root root 3.2K Nov  7 14:01 key.pem
-rw-r--r-- 1 root root   44 Jan 31 10:08 openssl.cnf

With that I get further:

balena preload $BALENA_DEVICE.img --fleet $BALENA_APP --commit current --dockerHost docker --dockerPort 2376 --debug --ca /certs/client/ca.pem --cert /certs/client/cert.pem --key /certs/client/key.pem

Error below:

Building Docker preloader image. [========================] 100%
Waiting for Docker to start...
Docker started
Retrying (count=2) /sbin/losetup -f --show /dev/loop24 False
...
Retrying (count=10) /sbin/losetup -f --show /dev/loop24 False
Hint: If using a Virtual Machine, consider increasing the number of processors.
If using Docker Desktop for Windows or macOS, it may require restarting.
An error has occurred executing internal preload command 'get_image_info':
{"command":"get_image_info","parameters":{}}
Status code: 1
Error: Traceback (most recent call last):
  File "/usr/src/app/preload.py", line 978, in <module>
    result = method(**data.get("parameters", {}))
  File "/usr/src/app/preload.py", line 934, in get_image_info
    "free_space": free_space(),
  File "/usr/src/app/preload.py", line 799, in free_space
    return get_partition("resin-data").free_space()
  File "/usr/src/app/preload.py", line 205, in free_space
    with mount_context_manager(device) as mountpoint:
  File "/usr/lib/python3.8/contextlib.py", line 113, in __enter__
    return next(self.gen)
  File "/usr/src/app/preload.py", line 178, in mount_context_manager
    with losetup_context_manager(image, offset, size) as device:
  File "/usr/lib/python3.8/contextlib.py", line 113, in __enter__
    return next(self.gen)
  File "/usr/src/app/preload.py", line 154, in losetup_context_manager
    device = retry_call(
  File "/usr/lib/python3.8/site-packages/retry/api.py", line 101, in retry_call
    return __retry_internal(partial(f, *args, **kwargs), exceptions, tries, delay, max_delay, backoff, jitter, logger)
  File "/usr/lib/python3.8/site-packages/retry/api.py", line 33, in __retry_internal
    return f()
  File "/usr/src/app/preload.py", line 117, in wrapped
    return func(*args, **kwargs)
  File "/usr/lib/python3.8/site-packages/sh.py", line 1427, in __call__
    return RunningCommand(cmd, call_args, stdin, stdout, stderr)
  File "/usr/lib/python3.8/site-packages/sh.py", line 774, in __init__
    self.wait()
  File "/usr/lib/python3.8/site-packages/sh.py", line 792, in wait
    self.handle_command_exit_code(exit_code)
  File "/usr/lib/python3.8/site-packages/sh.py", line 815, in handle_command_exit_code
    raise exc
sh.ErrorReturnCode_1: 
  RAN: /sbin/losetup -f --show /dev/loop24
  STDOUT:
  STDERR:
losetup: /dev/loop24: failed to set up loop device: No such file or directory
Error: An error has occurred executing internal preload command 'get_image_info':
{"command":"get_image_info","parameters":{}}
Status code: 1
Error: Traceback (most recent call last):
  File "/usr/src/app/preload.py", line 978, in <module>
    result = method(**data.get("parameters", {}))
  File "/usr/src/app/preload.py", line 934, in get_image_info
    "free_space": free_space(),
  File "/usr/src/app/preload.py", line 799, in free_space
    return get_partition("resin-data").free_space()
  File "/usr/src/app/preload.py", line 205, in free_space
    with mount_context_manager(device) as mountpoint:
  File "/usr/lib/python3.8/contextlib.py", line 113, in __enter__
    return next(self.gen)
  File "/usr/src/app/preload.py", line 178, in mount_context_manager
    with losetup_context_manager(image, offset, size) as device:
  File "/usr/lib/python3.8/contextlib.py", line 113, in __enter__
    return next(self.gen)
  File "/usr/src/app/preload.py", line 154, in losetup_context_manager
    device = retry_call(
  File "/usr/lib/python3.8/site-packages/retry/api.py", line 101, in retry_call
    return __retry_internal(partial(f, *args, **kwargs), exceptions, tries, delay, max_delay, backoff, jitter, logger)
  File "/usr/lib/python3.8/site-packages/retry/api.py", line 33, in __retry_internal
    return f()
  File "/usr/src/app/preload.py", line 117, in wrapped
    return func(*args, **kwargs)
  File "/usr/lib/python3.8/site-packages/sh.py", line 1427, in __call__
    return RunningCommand(cmd, call_args, stdin, stdout, stderr)
  File "/usr/lib/python3.8/site-packages/sh.py", line 774, in __init__
    self.wait()
  File "/usr/lib/python3.8/site-packages/sh.py", line 792, in wait
    self.handle_command_exit_code(exit_code)
  File "/usr/lib/python3.8/site-packages/sh.py", line 815, in handle_command_exit_code
    raise exc
sh.ErrorReturnCode_1: 
  RAN: /sbin/losetup -f --show /dev/loop24
  STDOUT:
  STDERR:
losetup: /dev/loop24: failed to set up loop device: No such file or directory
    at PassThrough.<anonymous> (/snapshot/balena-cli/node_modules/balena-preload/build/preload.js:341:28)
    at Object.onceWrapper (events.js:421:26)
    at PassThrough.emit (events.js:314:20)
    at PassThrough.EventEmitter.emit (domain.js:483:12)
    at addChunk (_stream_readable.js:297:12)
    at readableAddChunk (_stream_readable.js:272:9)
    at PassThrough.Readable.push (_stream_readable.js:213:10)
    at PassThrough.Transform.push (_stream_transform.js:152:32)
    at PassThrough.afterTransform (_stream_transform.js:96:10)
    at PassThrough._transform (_stream_passthrough.js:46:3)
    at PassThrough.Transform._read (_stream_transform.js:191:10)
    at PassThrough.Transform._write (_stream_transform.js:179:12)
    at doWrite (_stream_writable.js:403:12)
    at writeOrBuffer (_stream_writable.js:387:5)
    at PassThrough.Writable.write (_stream_writable.js:318:11)
    at processData (/snapshot/balena-cli/node_modules/docker-modem/lib/modem.js:371:18)
    at HttpDuplex.processData (/snapshot/balena-cli/node_modules/docker-modem/lib/modem.js:365:9)
    at HttpDuplex.emit (events.js:314:20)
    at HttpDuplex.EventEmitter.emit (domain.js:483:12)
    at addChunk (/snapshot/balena-cli/node_modules/docker-modem/node_modules/readable-stream/lib/_stream_readable.js:298:12)
    at readableAddChunk (/snapshot/balena-cli/node_modules/docker-modem/node_modules/readable-stream/lib/_stream_readable.js:280:11)
    at HttpDuplex.push (/snapshot/balena-cli/node_modules/docker-modem/node_modules/readable-stream/lib/_stream_readable.js:241:10)
    at IncomingMessage.<anonymous> (/snapshot/balena-cli/node_modules/docker-modem/lib/http_duplex.js:26:15)
    at IncomingMessage.emit (events.js:314:20)
    at IncomingMessage.EventEmitter.emit (domain.js:483:12)
    at addChunk (_stream_readable.js:297:12)
    at readableAddChunk (_stream_readable.js:272:9)
    at IncomingMessage.Readable.push (_stream_readable.js:213:10)
From previous event:
    at Preloader._runCommand (/snapshot/balena-cli/node_modules/balena-preload/build/preload.js:308:16)
    at /snapshot/balena-cli/node_modules/balena-preload/build/preload.js:392:38
    at Preloader._runWithSpinner (/snapshot/balena-cli/node_modules/balena-preload/build/preload.js:283:26)
    at Preloader._getImageInfo (/snapshot/balena-cli/node_modules/balena-preload/build/preload.js:391:20)
    at /snapshot/balena-cli/node_modules/balena-preload/build/preload.js:696:22
    at processTicksAndRejections (internal/process/task_queues.js:97:5)
For further help or support, visit:

@klutchell Any thoughts on this?

Hey @rapha, I suspect that this is your issue:

Retrying (count=2) /sbin/losetup -f --show /dev/loop24 False
...
Retrying (count=10) /sbin/losetup -f --show /dev/loop24 False
Hint: If using a Virtual Machine, consider increasing the number of processors.

Creating loopback devices from inside a container is something I’ve struggled with as well. Is your dind container already running as privileged? Does it work any better if you mount the path /dev:/dev with your container?

Have you tried increasing the resources of your CI runner, as per the hint?

We have a test container here that performs preload in a dind environment that might offer some examples:

It might also help if you share your full compose or CI/CD stack so we can see how the balena CLI binary is communicating with the docker daemon. They need to have the same absolute path to the image that is being preloaded as outlined here: GitHub - balena-io-experimental/balena-cli-docker: Example Docker image with the balena CLI and Docker-in-Docker

That causes other (alpine) containers to not run any more.

The complete cicd job looks as such:

dcd-build-preloaded-staging-t186/bab822c5bfeb9de44de0facdccef336c18578fe8: 
  variables: 
    DOCKER_TLS_CERTDIR: "/certs"
    GIT_SUBMODULE_STRATEGY: "recursive"
    DOCKER_HOST: "tcp://docker:2376"
    BALENA_APP: "staging-linux-armv8-l4t-t186"
    BALENA_DEVICE: "jetson-tx2-nx-devkit"
    BALENA_VERSION: "2.98.12.dev"
  image: "registry.gitlab.com/aivero/open-source/contrib/focal-balena/linux-x86_64:latest"
  tags: 
    - "x86_64"
    - "saas-linux-large-amd64"
  before_script: 
    - "cd $CI_PROJECT_DIR/deepcore-daemon-balena"
  script: 
    - "cd $CI_PROJECT_DIR/deepcore-daemon-balena"
    - "wget -q https://github.com/balena-io/balena-cli/releases/download/v15.0.3/balena-cli-v15.0.3-linux-x64-standalone.zip && unzip -q balena-cli-v15.0.3-linux-x64-standalone.zip"
    - "balena login -t $BALENA_CLI_KEY"
    - "balena os download $BALENA_DEVICE --version $BALENA_VERSION --output ./${BALENA_DEVICE}.img"
    - "balena preload $BALENA_DEVICE.img --fleet $BALENA_APP --commit current --dockerHost docker --dockerPort 2376 --debug --ca $DOCKER_TLS_CERTDIR/client/ca.pem --cert $DOCKER_TLS_CERTDIR/client/cert.pem --key $DOCKER_TLS_CERTDIR/client/key.pem"
    - "balena config generate --version $BALENA_VERSION --fleet $BALENA_APP --network ethernet --appUpdatePollInterval 10 --output ./${BALENA_DEVICE}.json --deviceType $BALENA_DEVICE --dev"
    - "balena config inject $BALENA_DEVICE.json --drive $BALENA_DEVICE.img"
  after_script: 
    - "cd $CI_PROJECT_DIR/deepcore-daemon-balena"
  services: 
    - 
      name: "docker:20-dind"

  interruptible: false

Does that mean that the image to preload needs to be at the same location inside the preload (dind) container as well as inside the container running balena cli?

Docker is run by the service, side by side with the balena cli container.
Gitlab claims that all services have the job directory mounted as a volume under /builds

So I’ll try symlinking the image file to a location in the balena cli container to match that /builds location.

Yeah, the docker daemon is the one spinning up the balena-preload container, and so the image in your CLI path needs to be available at the same absolute path on the docker host. You might be able to solve that with syminks or bind mounts.

I found another few examples of running balena preload in a container but they are all using docker-in-docker to avoid the complexities and issues of a docker host in a different namespace.
eg. GitHub - balena-io-hardware/hod-preloader-sw: Generate balenaOS images preloaded with the specified app release and make those images available for download via http