Install CuDNN on Jetson TX2

I have followed this guide to install CUDA on my Jetson TX2.
And followed this guide to install tensorflow 2.3.1 with GPU enabled.

But tensorflow doesn’t load GPU as it cannot find the CuDNN library files:

root@balena:/usr/app# python3
Python 3.6.9 (default, Jan 26 2021, 15:33:00) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2021-06-16 07:55:19.787140: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2
>>> tf.test.is_built_with_cuda()
True
>>> tf.test.gpu_device_name()
2021-06-16 07:55:36.860499: W tensorflow/core/platform/profile_utils/cpu_utils.cc:108] Failed to find bogomips or clock in /proc/cpuinfo; cannot determine CPU frequency
2021-06-16 07:55:36.863235: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x360ca980 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-06-16 07:55:36.863416: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2021-06-16 07:55:36.888703: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-06-16 07:55:37.081789: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1046] ARM64 does not support NUMA - returning NUMA node zero
2021-06-16 07:55:37.082712: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x3610b7b0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-06-16 07:55:37.082862: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): NVIDIA Tegra X2, Compute Capability 6.2
2021-06-16 07:55:37.083921: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1046] ARM64 does not support NUMA - returning NUMA node zero
2021-06-16 07:55:37.084469: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1742] Found device 0 with properties: 
pciBusID: 0000:00:00.0 name: NVIDIA Tegra X2 computeCapability: 6.2
coreClock: 1.3GHz coreCount: 2 deviceMemorySize: 7.66GiB deviceMemoryBandwidth: 38.74GiB/s
2021-06-16 07:55:37.085077: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2
2021-06-16 07:55:37.098847: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10
2021-06-16 07:55:37.125443: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-06-16 07:55:37.128341: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-06-16 07:55:37.140119: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-06-16 07:55:37.146259: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.10
2021-06-16 07:55:37.147146: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory
2021-06-16 07:55:37.147222: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1779] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2021-06-16 07:55:37.147408: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1283] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-06-16 07:55:37.147462: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1289]      0 
2021-06-16 07:55:37.147498: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1302] 0:   N 
''

How can I install CuDNN 8.0 on the CUDA service?

Submitted an issue here

Cheers!

Hello, if you’re using a balena base image in your Dockerfile, you should be able to add: RUN apt-get install -y libcudnn8 nvidia-cudnn8. If that does not work, please post your Dockerfile so we can take a look.

Yeah, it works:

>>> tf.test.gpu_device_name()
2021-06-17 03:28:08.178927: W tensorflow/core/platform/profile_utils/cpu_utils.cc:108] Failed to find bogomips or clock in /proc/cpuinfo; cannot determine CPU frequency
2021-06-17 03:28:08.181116: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x12a0f740 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-06-17 03:28:08.181383: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2021-06-17 03:28:08.219136: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-06-17 03:28:08.370055: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1046] ARM64 does not support NUMA - returning NUMA node zero
2021-06-17 03:28:08.371122: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x12da3590 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-06-17 03:28:08.371250: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): NVIDIA Tegra X2, Compute Capability 6.2
2021-06-17 03:28:08.372054: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1046] ARM64 does not support NUMA - returning NUMA node zero
2021-06-17 03:28:08.372345: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1742] Found device 0 with properties: 
pciBusID: 0000:00:00.0 name: NVIDIA Tegra X2 computeCapability: 6.2
coreClock: 1.3GHz coreCount: 2 deviceMemorySize: 7.66GiB deviceMemoryBandwidth: 38.74GiB/s
2021-06-17 03:28:08.372531: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2
2021-06-17 03:28:08.382578: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10
2021-06-17 03:28:08.396214: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-06-17 03:28:08.413786: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-06-17 03:28:08.424710: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-06-17 03:28:08.433412: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.10
2021-06-17 03:28:08.434862: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-06-17 03:28:08.435363: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1046] ARM64 does not support NUMA - returning NUMA node zero
2021-06-17 03:28:08.436060: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1046] ARM64 does not support NUMA - returning NUMA node zero
2021-06-17 03:28:08.436224: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1884] Adding visible gpu devices: 0
2021-06-17 03:28:08.445436: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2

2021-06-17 03:28:11.674224: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1283] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-06-17 03:28:11.674349: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1289]      0 
2021-06-17 03:28:11.674385: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1302] 0:   N 
2021-06-17 03:28:11.681245: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1046] ARM64 does not support NUMA - returning NUMA node zero
2021-06-17 03:28:11.681632: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1046] ARM64 does not support NUMA - returning NUMA node zero
2021-06-17 03:28:11.682036: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1428] Created TensorFlow device (/device:GPU:0 with 100 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
'/device:GPU:0'
>>> 

Thanks!

@alanb128

By the way, I cannot launch any nvidia-docker image on this container -

root@balena:/usr/app# docker run --runtime nvidia --network host -it nvcr.io/nvidia/l4t-tensorflow:r32.4.4-tf2.3-py3
docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request: unknown.
ERRO[0000] error waiting for container: context canceled 

Here is detailed information:

root@balena:/usr/app# head -n 1 /etc/nv_tegra_release
# R32 (release), REVISION: 4.4, GCID: 23942405, BOARD: t210ref, EABI: aarch64, DATE: Fri Oct 16 19:44:43 UTC 2020
root@balena:/usr/app# tegrastats 
RAM 2032/7846MB (lfb 16x4MB) SWAP 1/3923MB (cached 0MB) CPU [6%@498,off,off,4%@498,7%@499,3%@499] EMC_FREQ 0%@204 GR3D_FREQ 0%@114 APE 150 PLL@32C MCPU@32C PMIC@100C Tboard@28C GPU@31.5C BCPU@32C thermal@32.1C Tdiode@29C VDD_SYS_GPU 152/152 VDD_SYS_SOC 381/381 VDD_4V0_WIFI 19/19 VDD_IN 1600/1600 VDD_SYS_CPU 152/152 VDD_SYS_DDR 133/133
RAM 2032/7846MB (lfb 16x4MB) SWAP 1/3923MB (cached 0MB) CPU [1%@499,off,off,0%@500,4%@499,0%@499] EMC_FREQ 0%@204 GR3D_FREQ 0%@114 APE 150 PLL@32C MCPU@32C PMIC@100C Tboard@28C GPU@31C BCPU@32C thermal@32.1C Tdiode@29C VDD_SYS_GPU 152/152 VDD_SYS_SOC 381/381 VDD_4V0_WIFI 19/19 VDD_IN 1562/1581 VDD_SYS_CPU 152/152 VDD_SYS_DDR 114/123
root@balena:/usr/app# docker info
Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Build with BuildKit (Docker Inc., v0.5.1-docker)

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 14
 Server Version: 20.10.7
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux nvidia runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: d71fcd7d8303cbf684402823e425e9dd2e99285d
 runc version: b9ee9c6314599f1b4a7f497e1f1f856fe433d3b7
 init version: de40ad0
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 4.9.140-l4t-r32.4
 Operating System: Ubuntu 18.04.5 LTS (containerized)
 OSType: linux
 Architecture: aarch64
 CPUs: 4
 Total Memory: 7.662GiB
 Name: balena
 ID: C726:TCOB:HCPX:323M:WKGK:5EBR:RVPQ:XDKV:3VMX:IYXI:C4R4:4O25
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

Installed docker and nvidia-docker2 by following commands:

# Install docker.
RUN apt-get install -y apt-transport-https ca-certificates curl gnupg lsb-release && \
    curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg && \
    echo "deb [arch=arm64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \
        $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null && \
    apt-get update && apt-get install -y docker-ce

# Install Nvidia Container Toolkit
RUN distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
   && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
   && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
RUN apt-get update && apt-get install -y nvidia-docker2

Maybe something is wrong with my installation?

Cheers,
Shane.

Hi Shane, balenaOS does not currently support the Nvidia container runtime/toolkit, so those base images that require it will not be able to access the GPU. The guide you linked at the start of this thread has some workarounds, but let me know if you have any additional questions about it.

Hi, @alanb128

Any hope to install Nvidia container runtime/toolkit and configure correctly?

Hi Shane, there is ongoing work on a number of fronts to make this possible but I can’t provide an ETA at this time. In the meantime you’ll need to use base images that don’t require the runtime and install the necessary software via your Dockerfile.

Hi, @alanb128

Yeah, used the base images and installed the software for CUDA and CuDNN.

So, is it impossible to enable nvidia runtime at the moment?

Hi Shane, that’s correct. At the moment you can’t run Nvidia base images that require the runtime on balenaOS.

Hi, @alanb128

What kind of base images would be better to get started on Jetson TX2?

Your balena base images? e.g. balenalib/%%BALENA_MACHINE_NAME%%-ubuntu:latest?

And what packages should be installed?

Cheers,
Shane.

Hey @scarlyon … I think what Alan was referring to what he had said earlier.

Hello, if you’re using a balena base image in your Dockerfile, you should be able to add: RUN apt-get install -y libcudnn8 nvidia-cudnn8.

You can use the one of the Balena images, such as the ubuntu one you referenced, and then install what you need on it.

@toochevere @alanb128

Yes, I have installed libcudnn8, nvidia-cudnn8, cuda-toolkit-10-2, but still no luck…

Please check this message…

Hi Shane, just to clarify, if you use a balena base image and then install CUDA etc in your container, it should access the GPU. It seemed like you were able to accomplish this in Install CuDNN on Jetson TX2 - #5 by scarlyon where you mentioned “Yeah, it works” but please let me know if that’s not the case. If you did get it working there, there is no need for the runtime. In this thread the “runtime” refers to Nvidia’s container runtime and all associated elements (like the gpu flag) that lets their particular base images access the GPU. We don’t currently support that method of accessing the GPU, but if you are accessing the GPU using our base images, there’s no need for the runtime. The end result should be the same. I hope this helps clear up any confusion. I know it’s been a long-running thread, but do let us know if you’re still having trouble - we’ll do what we can to sort it out.

Hi, @alanb128

Thanks for your reply!

Yes, all is good inside the service. I can use GPU without any issue.

Our intention is to run docker containers with GPU supported.

We were able to get docker working, but --gpus=all or --runtime nvidia - these parameters doesn’t work although nvidia-container-toolkit is installed successfully.

Any idea to achieve this goal?

Cheers,
Shane.

@alanb128 @toochevere

I have just created another ticket with detailed explanation - Jetson: Support Nvidia Docker Images

Thanks!