Jetson Nano - L4T & CUDA/cuDNN best practices for multi-container apps

We are building a multi-container app on the Jetson Nano platform where multiple containers require the L4T drivers and/or CUDA/cuDNN libraries to function. What are the best practices to accomplish this? Given the size of these components (~300mb for L4T and ~4gb for CUDA/cuDNN) it seems that at a minimum it would be terribly wasteful to install these in each container that needs them, but potentially worse could cause conflicts in the case of L4T where the package installs a variety of services that could conflict with each other (i.e. what happens if you use nvpmodel to set the power mode to different levels in two separate containers?)

One solution I came up with is to have a single jetpack-l4t container which fully installs L4T during build and therefore “manages” the Jetson Nano hardware - setting power modes, installing kernel modules, firmware, etc. - and then has both io.balena.features.kernel-modules: '1' and io.balena.features.firmware: '1' set, to effectively serve the L4T modules and firmware to other containers. But the issue is that I then have another container that is running xorg, and requires the display drivers / xorg configuration that is part of the L4T package (specifically in config.tbz2) – so I suppose I could do a “partial” install of L4T with just the config file in that container? And what about the drivers and utilities - are they necessary in all containers?

Similarly, for containers that require CUDA/cuDNN, I am currently only installing the libraries from the NVIDIA packages in the jetpack-l4t container, and then copying the applicable folders - /usr/local/cuda-10.0, /usr/lib/aarch64-linux-gnu/and a couple of others - to a shared volume. Then in the containers that require CUDA/cuDNN, I am creating mirroring links from the CUDA/cuDNN libraries stored in the shared volume to the target locations using cp -ans <shared volume location> <target location> in the startup scripts. It works - but what I am wondering is whether there is a better way to do this.

In general, I’m feeling that I might be missing something in my approach to all of this, and would be interested to hear others’ experiences and/or best practices for building successful multi-container apps on the Jetson Nano.

Hi @drcnyc,

I’ve been thinking about this. My suggestion would be to build a base image which contains all the libraries and drivers you require and then build your service images from them. That will mean that the large layers are only downloaded once and shared between the container services. You could also do something similar with a multi-stage build and the docker build cache, but since I assume it also takes a while to build the libraries and drivers required.

Hope that helps,

Thanks for the reply. Just so I understand, when you refer to building a base image and others off of it, how would I do this? Right now I am using docker compose to build each image individually. Would using a base image still allow us to deploy via cloud or would it require us to pre load the images on devices? We are looking for a cloud deployable solution.

Also, if I were to build the CUDA/cuDNN libraries into a base image, it would be >4gb in size, and so would each image that starts from it. My current solution only stores the CUDA/cuDNN libraries once and they are shared across all containers that need them, so seems it would be more space efficient?

Lastly, if I were to build the L4T package into the base image, I would still have the problem of figuring out which of the containers is “in control” of the hardware, i.e. if you refer to the nvpmodel power configuration in my initial message, how would I manage this?

James’ suggestion is to make a new image with CUDA/etc., push it on Dockerhub, and then use it in FROM in whichever services you want to use CUDA with. Because of the way Docker images work, everything contained in the base image will only be downloaded once on the device.

Got it. So it will save the download time - but will it also save space on the SD/MMC? The issue is twofold, download and space, since each copy of CUDA/cuDNN is >4gb.

Also, is there a guide or examples available anywhere on building custom images based off of balena base images?

Here is an example of building an image from a balena base image: . If you were, for example, to base multiple images off of this 4.5gb resinplayground/jetson-nano-cuda-cudnn-opencv:v0.2-slim image, it would only be downloaded and stored once on the SD/MMC, with the other containers sharing this layer. (See for more details.)