Yes, you can even install in a VM if you want.
Following up on CUDA install topic, has anyone managed to compile jetson-inference on resin.io TX2?
Just found this Dockerfile which seem to have everything needed to install JetPack incl. CUDA:
Looks like a good way to deploy everything that’s needed
Awesome!!! My goal is to get pytorch and pocketOS working on this, I think the pieces are coming together.
Unfortunately Nvidia download links are broken… Does anyone has an updated working version?
updated urls to 3.2.1
#FROM aarch64/ubuntu
FROM arm64v8/ubuntu:xenial-20180123#AUTHOR bmwshop@gmail.com
MAINTAINER nuculur@gmail.comThis is the base container for the Jetson TX2 board with drivers (with cuda)
base URL for NVIDIA libs
Update packages, install some useful packages
RUN apt-get update && apt-get install -y apt-utils bzip2 curl sudo unp && apt-get clean && rm -rf /var/cache/apt
WORKDIR /tmpInstall drivers first
RUN curl -sL http://developer.nvidia.com/embedded/dlc/l4t-jetson-tx2-driver-package-28-2 | tar xvfj -
RUN chown root /etc/passwd /etc/sudoers /usr/lib/sudo/sudoers.so /etc/sudoers.d/README
RUN /tmp/Linux_for_Tegra/apply_binaries.sh -r / && rm -fr /tmp/*Pull the rest of the jetpack libs for cuda/cudnn and install
RUN curl $URL/cuda-repo-l4t-9-0-local_9.0.252-1_arm64.deb -so cuda-repo-l4t_arm64.deb
RUN curl $URL/libcudnn7_7.0.5.15-1+cuda9.0_arm64.deb -so /tmp/libcudnn_arm64.deb
RUN curl $URL/libcudnn7-dev_7.0.5.15-1+cuda9.0_arm64.deb -so /tmp/libcudnn-dev_arm64.debInstall libs: L4T, CUDA, cuDNN
RUN dpkg -i /tmp/cuda-repo-l4t_arm64.deb
RUN apt-key add /var/cuda-repo-9-0-local/7fa2af80.pub
RUN apt-get update && apt-get install -y cuda-toolkit-9.0
RUN dpkg -i /tmp/libcudnn_arm64.deb
RUN dpkg -i /tmp/libcudnn-dev_arm64.deb
ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/aarch64-linux-gnu/tegraRe-link libs in /usr/lib//tegra
RUN ln -s /usr/lib/aarch64-linux-gnu/tegra/libnvidia-ptxjitcompiler.so.28.2.0 /usr/lib/aarch64-linux-gnu/tegra/libnvidia-ptxjitcompiler.so
RUN ln -s /usr/lib/aarch64-linux-gnu/tegra/libnvidia-ptxjitcompiler.so.28.2.0 /usr/lib/aarch64-linux-gnu/tegra/libnvidia-ptxjitcompiler.so.1
RUN ln -sf /usr/lib/aarch64-linux-gnu/tegra/libGL.so /usr/lib/aarch64-linux-gnu/libGL.soD.R. – need to do this for some strange reason (for jetson tx2)
RUN ln -s /usr/lib/aarch64-linux-gnu/libcuda.so /usr/lib/aarch64-linux-gnu/libcuda.so.1
Clean up (don’t remove cuda libs… used by child containers)
RUN apt-get -y autoremove && apt-get -y autoclean
RUN rm -rf /var/cache/apt
Following various suggestions I have attempted to create a balena Dockerfile.template that installs CUDA-9.0 (as above) on a Jetson TX2 hosted on a dev board. I should have CTI Orbitty carriers in a few days.
The installation appears to go smoothly though the Nvidia installation guide for linux has some post-installation tests that are failing. The CUDA sample programs build but fail to run.
Dockerfile.template
FROM resin/%%RESIN_MACHINE_NAME%%-debian
# This is the base container for the Jetson TX2 board with drivers (with cuda)
# base URL for NVIDIA libs
ARG URL=https://developer.download.nvidia.com/devzone/devcenter/mobile/jetpack_l4t/3.2.1/m8u2ki/JetPackL4T_321_b23
# Update packages, install some useful packages
ARG DEBIAN_FRONTEND=noninteractive
# TODO: pciutils may be unnecessary once everything is working
RUN apt-get update && apt-get install -y --no-install-recommends \
apt-utils \
bzip2 \
curl \
pciutils \
sudo \
unp
# ca-certificates ensures that we can download from the nvidia site
# libssl and openssl come along for the ride
RUN apt-get install -y --reinstall --no-install-recommends \
ca-certificates \
libssl1.0.0 \
openssl
WORKDIR /tmp
# Install drivers first
RUN curl --silent --verbose --location http://developer.nvidia.com/embedded/dlc/l4t-jetson-tx2-driver-package-28-2 | tar xvfj -
RUN chown root /etc/passwd /etc/sudoers /usr/lib/sudo/sudoers.so /etc/sudoers.d/README
RUN /tmp/Linux_for_Tegra/apply_binaries.sh -r / && rm -fr /tmp/*
# Pull the rest of the jetpack libs for cuda/cudnn and install
RUN curl $URL/cuda-repo-l4t-9-0-local_9.0.252-1_arm64.deb -so cuda-repo-l4t_arm64.deb
RUN curl $URL/libcudnn7_7.0.5.15-1+cuda9.0_arm64.deb -so /tmp/libcudnn_arm64.deb
RUN curl $URL/libcudnn7-dev_7.0.5.15-1+cuda9.0_arm64.deb -so /tmp/libcudnn-dev_arm64.deb
# Install libs: L4T, CUDA, cuDNN
RUN dpkg -i /tmp/cuda-repo-l4t_arm64.deb
RUN apt-key add /var/cuda-repo-9-0-local/7fa2af80.pub
RUN apt-get update && apt-get install -y cuda-toolkit-9.0
RUN dpkg -i /tmp/libcudnn_arm64.deb
RUN dpkg -i /tmp/libcudnn-dev_arm64.deb
ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/aarch64-linux-gnu/tegra
# Re-link libs in /usr/lib/tegra
RUN ln -s /usr/lib/aarch64-linux-gnu/tegra/libnvidia-ptxjitcompiler.so.28.2.0 /usr/lib/aarch64-linux-gnu/tegra/libnvidia-ptxjitcompiler.so
RUN ln -s /usr/lib/aarch64-linux-gnu/tegra/libnvidia-ptxjitcompiler.so.28.2.0 /usr/lib/aarch64-linux-gnu/tegra/libnvidia-ptxjitcompiler.so.1
RUN ln -sf /usr/lib/aarch64-linux-gnu/tegra/libGL.so /usr/lib/aarch64-linux-gnu/libGL.so
# D.R. – need to do this for some strange reason (for jetson tx2)
RUN ln -s /usr/lib/aarch64-linux-gnu/libcuda.so /usr/lib/aarch64-linux-gnu/libcuda.so.1
WORKDIR /usr/src/app
COPY . ./
RUN echo $PATH
ENV PATH="${PATH}:/usr/local/cuda/bin"
RUN nvcc hello-world.cu -L /usr/local/cuda/lib -lcudart -o hello-world
CMD ["/usr/src/app/runServer.bash"]
runServer.bash:
#!/bin/bash
echo "PATH = ${PATH}"
cat /usr/local/cuda/version.txt
nvcc --version
/usr/src/app/hello-world
while true; do sleep 60; done
Here is what the runServer.bash outputs:
CUDA Version 9.0.252
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Sun_Nov_19_03:16:56_CST_2017
Cuda compilation tools, release 9.0, V9.0.252
Hello Error 38 at line 44
The error from hello-world is a failure of the first cudaMalloc to allocate 16 bytes.
I was also able to extract the following versions using the jetsonhacks/jetsonVersion
/etc/nv_tegra_release:
# R28 (release), REVISION: 2.0, GCID: 10136452, BOARD: t186ref, EABI: aarch64, DATE: Fri Dec 1 14:20:33 UTC 2017
JETSON_L4T=28.2.0
JETSON_CUDA=9.0.252
Using the Nvidia CUDA installation instructions for Linux I tried to verify the configuration:
$ export PATH=/usr/local/cuda-9.0/bin${PATH:+:${PATH}}
$ export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
$ echo $LD_LIBRARY_PATH
/usr/local/cuda-9.0/lib64:/usr/lib/aarch64-linux-gnu/tegra
$ systemctl status nvidia-persistenced
Failed to get D-Bus connection: Unknown error -1
$ cd /usr/local/cuda/samples/1_Utilities/deviceQuery
$ make
$ ../../bin/aarch64/linux/release/deviceQuery
../../bin/aarch64/linux/release/deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
cudaGetDeviceCount returned 35
-> CUDA driver version is insufficient for CUDA runtime version
Result = FAIL
The systemctl error might be the best clue but I don’t know.
Also the deviceQuery error…
Suggestions?
Here is the device: https://dashboard.balena-cloud.com/devices/ee83ba4d64e46b42d6803293b9b54771/summary
I have granted support access to it.
Perhaps a bit more insight from nvpmodel? This is supposed to control the power mode (max-q, max-n, etc)
$ nvpmodel -q –verbose
NVPM ERROR: null input file!
NVPM ERROR: Failed to parse /etc/nvpmodel.conf
$ ls -l /etc/nvpmodel.conf
ls: cannot access /etc/nvpmodel.conf: No such file or directory
Figured it out! The systemctl error lead me to adding to the Dockerfile.template:
# switch on systemd init system in container -- requires "privileged: true" in docker-compose.yml
ENV INITSYSTEM on
Output of deviceQuery:
../../bin/aarch64/linux/release/deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "NVIDIA Tegra X2"
CUDA Driver Version / Runtime Version 9.0 / 9.0
CUDA Capability Major/Minor version number: 6.2
Total amount of global memory: 7847 MBytes (8227979264 bytes)
( 2) Multiprocessors, (128) CUDA Cores/MP: 256 CUDA Cores
GPU Max Clock rate: 1301 MHz (1.30 GHz)
Memory Clock rate: 1600 Mhz
Memory Bus Width: 128-bit
L2 Cache Size: 524288 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: Yes
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 0 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.0, CUDA Runtime Version = 9.0, NumDevs = 1
Result = PASS
Awesome stuff !
Would you mind sharing a full working solution (the template file and anything else that is needed) ?
Marking the post as solution
could help others find it as well Thanks !
@cyplo
How do you want that shared?
How do we mark it as a solution?
Heya @jason10!
If the whole thing is not too long - if you could just paste it into a post here that would be great.
If it’s longer - pointing to a source in some way, in a post, would be great. Could be a git repo, or a gist on GitHub and a link to that.
To mark a post as a solution to the original post - you can click at the bottom of the post, near the reply
button, there should be a checkmark icon, either just there or hidden behind the ...
three dot button. (here’s more info: https://meta.discourse.org/t/discourse-solved-accepted-answer-plugin/30155)
Thanks !
Did you manage to get your example up and running? Could you share it? I’m looking to try to implement a solution to this in the next couple of days.
Specifically, I’m looking to run CUDA on Jetson Tx2 running YOLO (which is a darknet based computer vision tool). Interested to hear your thoughts or insights.
Almost, still working on getting Java 8, darknet, and Cuda all working together.
I’ve created a public repository at https://github.com/eiodiagnostics/Balena-Jetson-tx2-experiments
If you don’t need Java 8, then the dockerfile.template above should work.
Hi @jason10, so the solution was adding ENV INITSYSTEM on
to have systemd
available/running in your container, right?
If so, just a heads up, you are using a resin/...-debian
base image, which are being replaced with our new base images of balenalib/....
Just mentioning it, because if that was your solution above, it will need some adaptation to the new base images, which does not come with systemd
installed (this the INITSYSTEM
env var does not do any difference there by default)
See our blogpost announceing the base images and the documentation detailing them. In particular see the Installing Your Own Initsystem section, which also links to example projects for the files you need to add to your project to work.
If you wonder why this change happened, which then adds a bit more work initially to your side, it is because of our experience with init systems in containers, as mentioned in the docs:
Since the release of multicontainer on the balenaCloud platform, we now recommend the use of multiple containers and no longer recommend the use of an initsystem, particularly systemd, in the container as it tends to cause a myriad of issues, undefined behaviour and requires the container to run fully privileged.
If you have any issues setting things up, don’t hesitate to ask, though!
Also, if we have missed anything regarding your issue and solution, please let us know!
So is this part of the documentation no longer correct?
Actually I don’t think you have to even install onto a Jetson, just install the Nvidia SDK Manager (developer.nvidia.com), select your device, have it download everything and look for something like sdkml3_jetpack_l4t_42.json
in your downloads directory. Then you have to pull the URLs out of that.
Download the .deb file for the CUDA Toolkit for L4T either using a web browser on the device, or download on your PC then copy the file to your device using a USB flash stick or across the network itunes login
@jason10 I’m attempting to reproduce your work on a Jetson TX2 with a CTI Astro carrier board and running balenaOS 2.88.4+rev15.
I’m able to use deviceQuery and retrieve information from the GPU, but I still can’t execute the Hello World program or start the nvidia persistence daemon.
I’d like to confirm that the approach you took three years ago is still the best approach. Also, from what I currently understand, the nvidia persistence daemon is only a performance optimization, not a functionality requirement. Is that correct?
Laslty, of the 48 versions of the persistence daemon I found at pkgs.org, which did you use three years ago?
Yes it worked when I created that. It was important that we validated those operations before we continued with the Jetson TX2 and CTI Orbitty.
Unfortunately the project was halted about a year later and I left the company then.
It may be that you need to match your CTI Astro carrier board support package with the L4T version and update the Dockerfile.template to match.
Sorry I don’t have any other answers and it has been a long time.