Here is detailed issue we would like to solve.
We were able to install the docker & Nvidia container toolkit inside a service that is running on Jetson Device.
Here is our Docker file:
FROM balenalib/jetson-tx2-ubuntu:bionic
ENV DEBIAN_FRONTEND=noninteractive
# Install CUDA and some utilities
RUN apt-get update && apt-get install -y lbzip2 xorg cuda-toolkit-10-2 wget tar python3 libegl1 python3-gi
# Download and install BSP binaries for L4T 32.4.4
RUN apt-get update && apt-get install -y lbzip2 python3 libegl1 && \
wget https://developer.nvidia.com/embedded/L4T/r32_Release_v4.4/r32_Release_v4.4-GMC3/T210/Tegra210_Linux_R32.4.4_aarch64.tbz2 && \
tar xf Tegra210_Linux_R32.4.4_aarch64.tbz2 && \
cd Linux_for_Tegra && \
sed -i 's/config.tbz2\"/config.tbz2\" --exclude=etc\/hosts --exclude=etc\/hostname/g' apply_binaries.sh && \
sed -i 's/install --owner=root --group=root \"${QEMU_BIN}\" \"${L4T_ROOTFS_DIR}\/usr\/bin\/\"/#install --owner=root --group=root \"${QEMU_BIN}\" \"${L4T_ROOTFS_DIR}\/usr\/bin\/\"/g' nv_tegra/nv-apply-debs.sh && \
sed -i 's/LC_ALL=C chroot . mount -t proc none \/proc/ /g' nv_tegra/nv-apply-debs.sh && \
sed -i 's/umount ${L4T_ROOTFS_DIR}\/proc/ /g' nv_tegra/nv-apply-debs.sh && \
sed -i 's/chroot . \// /g' nv_tegra/nv-apply-debs.sh && \
./apply_binaries.sh -r / --target-overlay && cd .. \
rm -rf Tegra210_Linux_R32.4.4_aarch64.tbz2 && \
rm -rf Linux_for_Tegra && \
echo "/usr/lib/aarch64-linux-gnu/tegra" > /etc/ld.so.conf.d/nvidia-tegra.conf && ldconfig
# Install CuDNN
RUN apt-get install -y libcudnn8 nvidia-cudnn8
RUN apt-get update && apt-get install -y apt-utils dialog aufs-tools gcc libc-dev iptables conntrack unzip
# Install docker.
RUN apt-get install -y apt-transport-https ca-certificates curl gnupg lsb-release && \
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg && \
echo "deb [arch=arm64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null && \
apt-get update && apt-get install -y docker-ce
# Install Nvidia Container Toolkit
RUN distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
&& curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
RUN apt-get update && apt-get install -y nvidia-docker2
# Enable udevd so that plugged dynamic hardware devices show up in our container.
ENV UDEV=1
ENV INITSYSTEM on
And here is our docker-compose.yml
file to support docker inside our service(named sm
):
version: '2.1'
volumes:
docker-data:
services:
sm:
build: ./sm
restart: always
privileged: true
network_mode: host
environment:
- DBUS_SYSTEM_BUS_ADDRESS=unix:path=/host/run/dbus/system_bus_socket
volumes:
- 'docker-data:/var/lib/docker'
labels:
io.balena.features.supervisor-api: '1'
io.balena.features.balena-api: '1'
io.balena.features.kernel-modules: '1'
io.balena.features.firmware: '1'
io.balena.features.dbus: '1'
cap_add:
- SYS_RAWIO
ports:
- "80:80"
devices:
- "/dev:/dev"
- We can run any docker container inside this
sm
service. - Everything is installed on
sm
service correctly as shown below:
root@balena:/usr/app# head -n 1 /etc/nv_tegra_release
# R32 (release), REVISION: 4.4, GCID: 23942405, BOARD: t210ref, EABI: aarch64, DATE: Fri Oct 16 19:44:43 UTC 2020
root@balena:/usr/app# tegrastats
RAM 2032/7846MB (lfb 16x4MB) SWAP 1/3923MB (cached 0MB) CPU [6%@498,off,off,4%@498,7%@499,3%@499] EMC_FREQ 0%@204 GR3D_FREQ 0%@114 APE 150 PLL@32C MCPU@32C PMIC@100C Tboard@28C GPU@31.5C BCPU@32C thermal@32.1C Tdiode@29C VDD_SYS_GPU 152/152 VDD_SYS_SOC 381/381 VDD_4V0_WIFI 19/19 VDD_IN 1600/1600 VDD_SYS_CPU 152/152 VDD_SYS_DDR 133/133
RAM 2032/7846MB (lfb 16x4MB) SWAP 1/3923MB (cached 0MB) CPU [1%@499,off,off,0%@500,4%@499,0%@499] EMC_FREQ 0%@204 GR3D_FREQ 0%@114 APE 150 PLL@32C MCPU@32C PMIC@100C Tboard@28C GPU@31C BCPU@32C thermal@32.1C Tdiode@29C VDD_SYS_GPU 152/152 VDD_SYS_SOC 381/381 VDD_4V0_WIFI 19/19 VDD_IN 1562/1581 VDD_SYS_CPU 152/152 VDD_SYS_DDR 114/123
root@balena:/usr/app# docker info
Client:
Context: default
Debug Mode: false
Plugins:
app: Docker App (Docker Inc., v0.9.1-beta3)
buildx: Build with BuildKit (Docker Inc., v0.5.1-docker)
Server:
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 14
Server Version: 20.10.7
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux nvidia runc
Default Runtime: runc
Init Binary: docker-init
containerd version: d71fcd7d8303cbf684402823e425e9dd2e99285d
runc version: b9ee9c6314599f1b4a7f497e1f1f856fe433d3b7
init version: de40ad0
Security Options:
seccomp
Profile: default
Kernel Version: 4.9.140-l4t-r32.4
Operating System: Ubuntu 18.04.5 LTS (containerized)
OSType: linux
Architecture: aarch64
CPUs: 4
Total Memory: 7.662GiB
Name: balena
ID: C726:TCOB:HCPX:323M:WKGK:5EBR:RVPQ:XDKV:3VMX:IYXI:C4R4:4O25
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
But we cannot use --runtime nvidia
parameter to launch the nvidia base images:
root@balena:/usr/app# docker run --runtime nvidia --network host -it nvcr.io/nvidia/l4t-tensorflow:r32.4.4-tf2.3-py3
docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request: unknown.
ERRO[0000] error waiting for container: context canceled
Any idea to support GPU-based docker containers?
Cheers,
Shane.