Installing CUDA on Jetson TX2

Yes, you can even install in a VM if you want.

Following up on CUDA install topic, has anyone managed to compile jetson-inference on resin.io TX2?

Just found this Dockerfile which seem to have everything needed to install JetPack incl. CUDA:

Looks like a good way to deploy everything that’s needed

Awesome!!! My goal is to get pytorch and pocketOS working on this, I think the pieces are coming together.

Unfortunately Nvidia download links are broken… Does anyone has an updated working version?

updated urls to 3.2.1

#FROM aarch64/ubuntu
FROM arm64v8/ubuntu:xenial-20180123

#AUTHOR bmwshop@gmail.com
MAINTAINER nuculur@gmail.com

This is the base container for the Jetson TX2 board with drivers (with cuda)

base URL for NVIDIA libs

ARG URL=https://developer.download.nvidia.com/devzone/devcenter/mobile/jetpack_l4t/3.2.1/m8u2ki/JetPackL4T_321_b23

Update packages, install some useful packages

RUN apt-get update && apt-get install -y apt-utils bzip2 curl sudo unp && apt-get clean && rm -rf /var/cache/apt
WORKDIR /tmp

Install drivers first

RUN curl -sL http://developer.nvidia.com/embedded/dlc/l4t-jetson-tx2-driver-package-28-2 | tar xvfj -
RUN chown root /etc/passwd /etc/sudoers /usr/lib/sudo/sudoers.so /etc/sudoers.d/README
RUN /tmp/Linux_for_Tegra/apply_binaries.sh -r / && rm -fr /tmp/*

Pull the rest of the jetpack libs for cuda/cudnn and install

RUN curl $URL/cuda-repo-l4t-9-0-local_9.0.252-1_arm64.deb -so cuda-repo-l4t_arm64.deb
RUN curl $URL/libcudnn7_7.0.5.15-1+cuda9.0_arm64.deb -so /tmp/libcudnn_arm64.deb
RUN curl $URL/libcudnn7-dev_7.0.5.15-1+cuda9.0_arm64.deb -so /tmp/libcudnn-dev_arm64.deb

Install libs: L4T, CUDA, cuDNN

RUN dpkg -i /tmp/cuda-repo-l4t_arm64.deb
RUN apt-key add /var/cuda-repo-9-0-local/7fa2af80.pub
RUN apt-get update && apt-get install -y cuda-toolkit-9.0
RUN dpkg -i /tmp/libcudnn_arm64.deb
RUN dpkg -i /tmp/libcudnn-dev_arm64.deb
ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/aarch64-linux-gnu/tegra

Re-link libs in /usr/lib//tegra

RUN ln -s /usr/lib/aarch64-linux-gnu/tegra/libnvidia-ptxjitcompiler.so.28.2.0 /usr/lib/aarch64-linux-gnu/tegra/libnvidia-ptxjitcompiler.so
RUN ln -s /usr/lib/aarch64-linux-gnu/tegra/libnvidia-ptxjitcompiler.so.28.2.0 /usr/lib/aarch64-linux-gnu/tegra/libnvidia-ptxjitcompiler.so.1
RUN ln -sf /usr/lib/aarch64-linux-gnu/tegra/libGL.so /usr/lib/aarch64-linux-gnu/libGL.so

D.R. – need to do this for some strange reason (for jetson tx2)

RUN ln -s /usr/lib/aarch64-linux-gnu/libcuda.so /usr/lib/aarch64-linux-gnu/libcuda.so.1

Clean up (don’t remove cuda libs… used by child containers)

RUN apt-get -y autoremove && apt-get -y autoclean
RUN rm -rf /var/cache/apt

Following various suggestions I have attempted to create a balena Dockerfile.template that installs CUDA-9.0 (as above) on a Jetson TX2 hosted on a dev board. I should have CTI Orbitty carriers in a few days.

The installation appears to go smoothly though the Nvidia installation guide for linux has some post-installation tests that are failing. The CUDA sample programs build but fail to run.

Dockerfile.template

FROM resin/%%RESIN_MACHINE_NAME%%-debian

# This is the base container for the Jetson TX2 board with drivers (with cuda)
# base URL for NVIDIA libs
ARG URL=https://developer.download.nvidia.com/devzone/devcenter/mobile/jetpack_l4t/3.2.1/m8u2ki/JetPackL4T_321_b23

# Update packages, install some useful packages
ARG DEBIAN_FRONTEND=noninteractive

# TODO: pciutils may be unnecessary once everything is working
RUN apt-get update && apt-get install -y --no-install-recommends \
    apt-utils \
    bzip2 \
    curl \
    pciutils \ 
    sudo \
    unp 

# ca-certificates ensures that we can download from the nvidia site
# libssl and openssl come along for the ride
RUN apt-get install -y --reinstall --no-install-recommends \
  ca-certificates \
  libssl1.0.0 \
  openssl

WORKDIR /tmp

# Install drivers first
RUN curl --silent --verbose --location http://developer.nvidia.com/embedded/dlc/l4t-jetson-tx2-driver-package-28-2 | tar xvfj -
RUN chown root /etc/passwd /etc/sudoers /usr/lib/sudo/sudoers.so /etc/sudoers.d/README
RUN /tmp/Linux_for_Tegra/apply_binaries.sh -r / && rm -fr /tmp/*

# Pull the rest of the jetpack libs for cuda/cudnn and install
RUN curl $URL/cuda-repo-l4t-9-0-local_9.0.252-1_arm64.deb -so cuda-repo-l4t_arm64.deb
RUN curl $URL/libcudnn7_7.0.5.15-1+cuda9.0_arm64.deb -so /tmp/libcudnn_arm64.deb
RUN curl $URL/libcudnn7-dev_7.0.5.15-1+cuda9.0_arm64.deb -so /tmp/libcudnn-dev_arm64.deb

# Install libs: L4T, CUDA, cuDNN
RUN dpkg -i /tmp/cuda-repo-l4t_arm64.deb
RUN apt-key add /var/cuda-repo-9-0-local/7fa2af80.pub
RUN apt-get update && apt-get install -y cuda-toolkit-9.0
RUN dpkg -i /tmp/libcudnn_arm64.deb
RUN dpkg -i /tmp/libcudnn-dev_arm64.deb
ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/aarch64-linux-gnu/tegra

# Re-link libs in /usr/lib/tegra
RUN ln -s /usr/lib/aarch64-linux-gnu/tegra/libnvidia-ptxjitcompiler.so.28.2.0 /usr/lib/aarch64-linux-gnu/tegra/libnvidia-ptxjitcompiler.so
RUN ln -s /usr/lib/aarch64-linux-gnu/tegra/libnvidia-ptxjitcompiler.so.28.2.0 /usr/lib/aarch64-linux-gnu/tegra/libnvidia-ptxjitcompiler.so.1
RUN ln -sf /usr/lib/aarch64-linux-gnu/tegra/libGL.so /usr/lib/aarch64-linux-gnu/libGL.so

# D.R. – need to do this for some strange reason (for jetson tx2)
RUN ln -s /usr/lib/aarch64-linux-gnu/libcuda.so /usr/lib/aarch64-linux-gnu/libcuda.so.1

WORKDIR /usr/src/app
COPY . ./
RUN echo $PATH
ENV PATH="${PATH}:/usr/local/cuda/bin"
RUN nvcc hello-world.cu -L /usr/local/cuda/lib -lcudart -o hello-world
CMD ["/usr/src/app/runServer.bash"]

runServer.bash:

#!/bin/bash
echo "PATH = ${PATH}"
cat /usr/local/cuda/version.txt
nvcc --version
/usr/src/app/hello-world
while true; do sleep 60; done

Here is what the runServer.bash outputs:

CUDA Version 9.0.252
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Sun_Nov_19_03:16:56_CST_2017
Cuda compilation tools, release 9.0, V9.0.252
Hello Error 38 at line 44

The error from hello-world is a failure of the first cudaMalloc to allocate 16 bytes. :frowning:

I was also able to extract the following versions using the jetsonhacks/jetsonVersion

/etc/nv_tegra_release:
# R28 (release), REVISION: 2.0, GCID: 10136452, BOARD: t186ref, EABI: aarch64, DATE: Fri Dec  1 14:20:33 UTC 2017
JETSON_L4T=28.2.0
JETSON_CUDA=9.0.252

Using the Nvidia CUDA installation instructions for Linux I tried to verify the configuration:

$ export PATH=/usr/local/cuda-9.0/bin${PATH:+:${PATH}}
$ export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

$ echo $LD_LIBRARY_PATH
/usr/local/cuda-9.0/lib64:/usr/lib/aarch64-linux-gnu/tegra

$ systemctl status nvidia-persistenced
Failed to get D-Bus connection: Unknown error -1

$ cd /usr/local/cuda/samples/1_Utilities/deviceQuery
$ make
$ ../../bin/aarch64/linux/release/deviceQuery
../../bin/aarch64/linux/release/deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
cudaGetDeviceCount returned 35
-> CUDA driver version is insufficient for CUDA runtime version
Result = FAIL

The systemctl error might be the best clue but I don’t know.
Also the deviceQuery error…

Suggestions?

Here is the device: https://dashboard.balena-cloud.com/devices/ee83ba4d64e46b42d6803293b9b54771/summary

I have granted support access to it.


Perhaps a bit more insight from nvpmodel? This is supposed to control the power mode (max-q, max-n, etc)

$ nvpmodel -q –verbose
NVPM ERROR: null input file!
NVPM ERROR: Failed to parse /etc/nvpmodel.conf
$ ls -l /etc/nvpmodel.conf
ls: cannot access /etc/nvpmodel.conf: No such file or directory

Figured it out! The systemctl error lead me to adding to the Dockerfile.template:

# switch on systemd init system in container -- requires "privileged: true" in docker-compose.yml
ENV INITSYSTEM on

Output of deviceQuery:

../../bin/aarch64/linux/release/deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA Tegra X2"
  CUDA Driver Version / Runtime Version          9.0 / 9.0
  CUDA Capability Major/Minor version number:    6.2
  Total amount of global memory:                 7847 MBytes (8227979264 bytes)
  ( 2) Multiprocessors, (128) CUDA Cores/MP:     256 CUDA Cores
  GPU Max Clock rate:                            1301 MHz (1.30 GHz)
  Memory Clock rate:                             1600 Mhz
  Memory Bus Width:                              128-bit
  L2 Cache Size:                                 524288 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 32768
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            Yes
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 0 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.0, CUDA Runtime Version = 9.0, NumDevs = 1
Result = PASS

Awesome stuff !
Would you mind sharing a full working solution (the template file and anything else that is needed) ?
Marking the post as solution could help others find it as well :slight_smile: Thanks !

@cyplo
How do you want that shared?
How do we mark it as a solution?

Heya @jason10!
If the whole thing is not too long - if you could just paste it into a post here that would be great.
If it’s longer - pointing to a source in some way, in a post, would be great. Could be a git repo, or a gist on GitHub and a link to that.

To mark a post as a solution to the original post - you can click at the bottom of the post, near the reply button, there should be a checkmark icon, either just there or hidden behind the ... three dot button. (here’s more info: https://meta.discourse.org/t/discourse-solved-accepted-answer-plugin/30155)

Thanks !

Did you manage to get your example up and running? Could you share it? I’m looking to try to implement a solution to this in the next couple of days.

Specifically, I’m looking to run CUDA on Jetson Tx2 running YOLO (which is a darknet based computer vision tool). Interested to hear your thoughts or insights.

Almost, still working on getting Java 8, darknet, and Cuda all working together.

I’ve created a public repository at https://github.com/eiodiagnostics/Balena-Jetson-tx2-experiments

If you don’t need Java 8, then the dockerfile.template above should work.

Hi @jason10, so the solution was adding ENV INITSYSTEM on to have systemd available/running in your container, right?

If so, just a heads up, you are using a resin/...-debian base image, which are being replaced with our new base images of balenalib/.... Just mentioning it, because if that was your solution above, it will need some adaptation to the new base images, which does not come with systemd installed (this the INITSYSTEM env var does not do any difference there by default)

See our blogpost announceing the base images and the documentation detailing them. In particular see the Installing Your Own Initsystem section, which also links to example projects for the files you need to add to your project to work.

If you wonder why this change happened, which then adds a bit more work initially to your side, it is because of our experience with init systems in containers, as mentioned in the docs:

Since the release of multicontainer on the balenaCloud platform, we now recommend the use of multiple containers and no longer recommend the use of an initsystem, particularly systemd, in the container as it tends to cause a myriad of issues, undefined behaviour and requires the container to run fully privileged.

If you have any issues setting things up, don’t hesitate to ask, though!

Also, if we have missed anything regarding your issue and solution, please let us know!

So is this part of the documentation no longer correct?

Actually I don’t think you have to even install onto a Jetson, just install the Nvidia SDK Manager (developer.nvidia.com), select your device, have it download everything and look for something like sdkml3_jetpack_l4t_42.json in your downloads directory. Then you have to pull the URLs out of that.

Download the .deb file for the CUDA Toolkit for L4T either using a web browser on the device, or download on your PC then copy the file to your device using a USB flash stick or across the network itunes login

@jason10 I’m attempting to reproduce your work on a Jetson TX2 with a CTI Astro carrier board and running balenaOS 2.88.4+rev15.

I’m able to use deviceQuery and retrieve information from the GPU, but I still can’t execute the Hello World program or start the nvidia persistence daemon.

I’d like to confirm that the approach you took three years ago is still the best approach. Also, from what I currently understand, the nvidia persistence daemon is only a performance optimization, not a functionality requirement. Is that correct?

Laslty, of the 48 versions of the persistence daemon I found at pkgs.org, which did you use three years ago?

Yes it worked when I created that. It was important that we validated those operations before we continued with the Jetson TX2 and CTI Orbitty.

Unfortunately the project was halted about a year later and I left the company then.

It may be that you need to match your CTI Astro carrier board support package with the L4T version and update the Dockerfile.template to match.

Sorry I don’t have any other answers and it has been a long time.