By the way, I cannot launch any nvidia-docker image on this container -
root@balena:/usr/app# docker run --runtime nvidia --network host -it nvcr.io/nvidia/l4t-tensorflow:r32.4.4-tf2.3-py3
docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request: unknown.
ERRO[0000] error waiting for container: context canceled
Here is detailed information:
root@balena:/usr/app# head -n 1 /etc/nv_tegra_release
# R32 (release), REVISION: 4.4, GCID: 23942405, BOARD: t210ref, EABI: aarch64, DATE: Fri Oct 16 19:44:43 UTC 2020
root@balena:/usr/app# tegrastats
RAM 2032/7846MB (lfb 16x4MB) SWAP 1/3923MB (cached 0MB) CPU [6%@498,off,off,4%@498,7%@499,3%@499] EMC_FREQ 0%@204 GR3D_FREQ 0%@114 APE 150 PLL@32C MCPU@32C PMIC@100C Tboard@28C GPU@31.5C BCPU@32C thermal@32.1C Tdiode@29C VDD_SYS_GPU 152/152 VDD_SYS_SOC 381/381 VDD_4V0_WIFI 19/19 VDD_IN 1600/1600 VDD_SYS_CPU 152/152 VDD_SYS_DDR 133/133
RAM 2032/7846MB (lfb 16x4MB) SWAP 1/3923MB (cached 0MB) CPU [1%@499,off,off,0%@500,4%@499,0%@499] EMC_FREQ 0%@204 GR3D_FREQ 0%@114 APE 150 PLL@32C MCPU@32C PMIC@100C Tboard@28C GPU@31C BCPU@32C thermal@32.1C Tdiode@29C VDD_SYS_GPU 152/152 VDD_SYS_SOC 381/381 VDD_4V0_WIFI 19/19 VDD_IN 1562/1581 VDD_SYS_CPU 152/152 VDD_SYS_DDR 114/123
root@balena:/usr/app# docker info
Client:
Context: default
Debug Mode: false
Plugins:
app: Docker App (Docker Inc., v0.9.1-beta3)
buildx: Build with BuildKit (Docker Inc., v0.5.1-docker)
Server:
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 14
Server Version: 20.10.7
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux nvidia runc
Default Runtime: runc
Init Binary: docker-init
containerd version: d71fcd7d8303cbf684402823e425e9dd2e99285d
runc version: b9ee9c6314599f1b4a7f497e1f1f856fe433d3b7
init version: de40ad0
Security Options:
seccomp
Profile: default
Kernel Version: 4.9.140-l4t-r32.4
Operating System: Ubuntu 18.04.5 LTS (containerized)
OSType: linux
Architecture: aarch64
CPUs: 4
Total Memory: 7.662GiB
Name: balena
ID: C726:TCOB:HCPX:323M:WKGK:5EBR:RVPQ:XDKV:3VMX:IYXI:C4R4:4O25
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Installed docker and nvidia-docker2 by following commands:
# Install docker.
RUN apt-get install -y apt-transport-https ca-certificates curl gnupg lsb-release && \
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg && \
echo "deb [arch=arm64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null && \
apt-get update && apt-get install -y docker-ce
# Install Nvidia Container Toolkit
RUN distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
&& curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
RUN apt-get update && apt-get install -y nvidia-docker2
Maybe something is wrong with my installation?
Cheers,
Shane.