Jetson: Support Nvidia Docker Images

Hey @nmaas87, thanks very much for your quick reply and help.

In my case, I need GPU enabled Pytorch. As you know, installing GPU enabled Pytorch is not straightforward. Therefore, I wanted to benefit from Nvidia Pytorch container in my Dockerfile. However, I couldn’t use it for my Nvidia Jetson Nano device having BalenaOS(v2.82.11+rev11). In essence, I am quite bit confused about the containerization of my app requiring to use GPU enabled Pytorch and Cuda in order to manage my deployment into the device having BalenaOS over Balena Cloud. That is my Dockerfile:

FROM nvcr.io/nvidia/l4t-pytorch:r32.5.0-pth1.7-py3

ENV PYTHONUNBUFFERED 1
ENV DEBIAN_FRONTEND noninteractive
ENV OPENBLAS_CORETYPE ARMV8
ENV UDEV 1

RUN apt-get update && apt-get upgrade -y && apt-get install gcc -y && apt-get install apt-utils -y \
    && rm -rf /var/lib/apt/lists/* \
    && apt-get clean

ENV PROGRAM_DIR=/app

RUN mkdir $PROGRAM_DIR
WORKDIR $PROGRAM_DIR

COPY requirements.txt /tmp

RUN pip3 install --upgrade pip
RUN pip3 install -r /tmp/requirements.txt

COPY . $PROGRAM_DIR

RUN python3 -m pip list

CMD python3 $PROGRAM_DIR/example.py

If you don’t mind, I have several questions below:

  1. Can we use the containers provided by Nvidia for the dockerfile on BalenaOS? I wanted to use the container below on the first line on my Dockerfile provided by Nvidia since that container have all the libraries I need such as GPU enabled PyTorch, Cuda toolkit or etc.
FROM nvcr.io/nvidia/l4t-pytorch:r32.5.0-pth1.7-py3

However, even if I can build the Dockerfile, I am getting the error on Balena Cloud.

OSError: libcurand.so.10: cannot open shared object file: No such file or directory

I have been researching the error for a week. I encountered the Nvidia blog post below in which the same error was reported. So far, I can say that the error is related to that Dockerfile cannot completely configure the relation between the Nvidia container and BalenaOS on the device if I am not wrong. I guess that I am supposed to set my Docker daemon’s default runtime to Nvidia. However, there is no daemon.json file in my device running BalenaOS.

https://forums.developer.nvidia.com/t/docker-build-on-jetson-xavier/185118

I guess that nvidia-container-runtime should be required to use the Nvidia container and daemon.json in it should be configured. Therefore, I tried to install nvidia-container-runtime since my device have docker version 19.03.23 and L4T version R.32.5.0. However, BalenaOs is not allowing me to install this tool. I also researched why it is happening. To https://nvidia.github.io/nvidia-container-runtime/ repository, the supported distributions of Nvidia Container Runtime are listed. If I am not wrong, BalenaOs is Yocto Linux based hostOS. I couldnt see BalenaOS in this list. Overall, I couldn’t go ahead on the usage of Nvidia containers.

Can you please guide me on this if I am missing anything? Can I use Nvidia containers on Dockerfile for BalenaOS somehow?

  1. If I cannot use Nvidia containers, am I supposed to use only Balenalib containers such as the one below in the tutorial you suggested? If so, what would you suggest me about how I can build my app requiring GPU enabled Pytorch in Dockerfile for my device having BalenaOS?
balenalib/jetson-nano-ubuntu:bionic

I would be so grateful if you help me with this

Hi there,
sadly I cannot help you much there as I am a volunteer and just checking in from my lunch break of my real job. However, I want to put emphasize on the shared repo, again.

Regarding 1.) balenaEngine does not poses the nvidia-container-runtime you’re looking for and cannot load any external modules. As far as I know, one of the reasons why balenaEngine is so sleak and fast compared to the “original” Docker Engine, is that has no plugin system anymore. However, it should not be necessary to use this runtime, as its probably just binding A LOT of folders/files in the background as mentioned earlier. So I wold not try with the NVIDIA containers, which probably would need something specific which you only could get running with a lot of leg work.

2.) Coming to this point, as I said earlier, this repo GitHub - balena-io-playground/jetson-nano-sample-new: Jetson Nano sample using new packages contains a fully working example building and running a balena Compatible container with a) CUDA and b) OPENCV support. There is also a chance that pytorch might be installed in one of them. Both the docker-compose file and the Dockerfiles are completely open. In your case, I would just run the example, see if there is there what you need and work with this for your project. Probably my efficient that trying to hammer the NVIDIA containers until they fit.

I think we that, you should be coming into some functional prototype within a short amount of time :slight_smile:

Cheers

2 Likes

Catching up on this thread, thanks @nmaas87 for the assistance here, and your notes and explanation are exactly what I would recommend to @aktaseren as well. The nvidia runtime is not available from balenaOS, so everything you need for your application is going to need to be installed in your container. Our base images and that example repo have the best starting point, CUDA and/or OpenCV container example might indeed pull in PyTorch, I am not entirely sure … but you could certainly take that template and expand upon it to get it in there. Thanks!

2 Likes

Guys @nmaas87 @dtischler, thanks a lot for the brainstorm. Your suggestions actually helped me a lot. Similar to the guide repo you suggested, I encountered Balena Hub which consists of full of projects realized. Some of them are similar to what I am doing with Nvidia devices.

For example,
ROS2-Pose estimation example project was done with the usage of libraries Cuda+PyTorch+OpenCV. It is a little complex but I tried it and it is working very well.

Thanks a lot again.

2 Likes

I forgot about that repo, excellent, glad it helped. That is a rather full-featured example, so you might need to trim down a bit for your specific use-case, but either way that is an excellent starting point yep!

@dtischler

What do I have to do to get the nvidia runtime working? Any idea to solve this issue?

Cheers.

As already told, the nvidia runtime is a proprietary piece of software extension / plugin which loads into the Docker Engine. It cannot be loaded into balenaEngine.
For your projects, I would suggest working without it using the example projects as given already in this thread.

Thanks!

@nmaas87
Seems you have misunderstood.

We can use GPU on the balena services, but the intent is to install docker inside a balena service and start a container with GPU supported.

We were able to get this running on AMD boards.

Have you ever tried this on your Jetson board?

  • Prepare a balena service.
  • Install docker inside it.
  • Run a container with nvidia runtime enabled.
  • Check if you can detect GPU without any issue.

Cheers!

Hi @scarlyon - yes I definitely did not get this usecase - and as a matter of fact its the first time I got such a question :slight_smile: .

I have not tried it and do not know whether Docker Engine will run in the balena Engine in a ARM64 board. What I would suspect would be that even if this worked, there might be problems with getting GPU support out of the nested container solution, as the nvidia runtime is binding them probably differently than the balenaEngine.

Also, all containers enabling CUDA or similar features tend to be big in size due to all the dependencies necessary. Adding Docker on top of all this and nesting the real containers in a contanier + engine will increase the hardware demands, slow down update and build cycles and nearly render you blind (logwise) as all the features in balenaCloud or openBalena to monitor your containers are basically separated from the container by another layer of abstraction. I am guessing this will also lead to degradation of performance and serviceability of the overall system, hence I would strongly advise against it. I can understand that one would want to reuse existing images as far as possible, but in this case getting this to work might end up costing you more time and headache down the line than going down the proper way from the start.

But I am just an Ambassador and this is my private opinion, maybe the balena guys like @dtischler think differently about my points :slight_smile:

1 Like

Thanks for your reply, mate.

But we have a specific use-case where the GPU should be used on a container that is running inside a balena service. Absolutely agree with your opinion, but anyway, we have to go this way… :slight_smile:

Cheers,
Shane.

1 Like

@dtischler @jakogut @alanb128

Just checking if there is any update on this issue?

Sorry for the delayed reply on this one @scarlyon – but unfortunately no news to report. :expressionless: