Here is my Dockerfile.template
to install NVidia driver inside a container:
FROM balenalib/%%BALENA_MACHINE_NAME%%-ubuntu:latest
ARG RESINOS_VERSION=2.58.6%2Brev1.prod
ARG YOCTO_VERSION=5.2.10
ARG NVIDIA_DRIVER_VERSION=465.31
ENV YOCTO_KERNEL=${YOCTO_VERSION}-yocto-standard
ENV NVIDIA_DRIVER_RUN=NVIDIA-Linux-x86_64-${NVIDIA_DRIVER_VERSION}.run
ENV DEBIAN_FRONTEND=noninteractive
# Install Nvidia Driver
RUN apt-get update && apt-get install -y wget gcc build-essential apt-utils dialog aufs-tools libc-dev iptables conntrack unzip libglu1-mesa-dev
RUN wget -nv https://files.balena-staging.com/images/%%BALENA_MACHINE_NAME%%/${RESINOS_VERSION}/kernel_modules_headers.tar.gz && \
tar -xzvf kernel_modules_headers.tar.gz && \
mkdir -p /lib/modules/${YOCTO_KERNEL} && \
cp -r kernel_modules_headers /lib/modules/${YOCTO_KERNEL}/build && \
ln -s /lib64/ld-linux-x86-64.so.2 /lib/ld-linux-x86-64.so.2 && \
wget -nv http://us.download.nvidia.com/XFree86/Linux-x86_64/${NVIDIA_DRIVER_VERSION}/${NVIDIA_DRIVER_RUN} && \
chmod +x ./${NVIDIA_DRIVER_RUN} && \
mkdir -p /nvidia && \
mkdir -p /nvidia/driver && \
./${NVIDIA_DRIVER_RUN} \
--kernel-install-path=/nvidia/driver \
--ui=none \
--no-drm \
--no-x-check \
--install-compat32-libs \
--no-nouveau-check \
--no-nvidia-modprobe \
--no-rpms \
--no-backup \
--no-check-for-alternate-installs \
--no-libglx-indirect \
--no-install-libglvnd \
--x-prefix=/tmp/null \
--x-module-path=/tmp/null \
--x-library-path=/tmp/null \
--x-sysconfig-path=/tmp/null \
--kernel-name=${YOCTO_KERNEL} && \
rm -rf /tmp/* ${NVIDIA_DRIVER_RUN} kernel_modules_headers.tar.gz
CMD ["bash", "/usr/app/start.sh"]
The content of start.sh
:
insmod /nvidia/driver/nvidia.ko || true
insmod /nvidia/driver/nvidia-modeset.ko || true
insmod /nvidia/driver/nvidia-uvm.ko || true
I am using genericx86_64 2.58.6 development image to build locally with sudo balena push <IP> --noparent-check
command.
The problem is that this works well on my 1st PC with GTX1660 installed:
root@balena:/usr/app# nvidia-smi
Thu Jun 3 07:58:27 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.31 Driver Version: 465.31 CUDA Version: 11.3 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | N/A |
| 25% 43C P0 20W / 120W | 0MiB / 5941MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
root@balena:/usr/app#
But sadly this doesn’t work on the other PCs with GTX970/GTX1060 installed:
root@balena:/usr/app# lshw -C display
*-display
description: VGA compatible controller
product: GM204 [GeForce GTX 970]
vendor: NVIDIA Corporation
physical id: 0
bus info: pci@0000:01:00.0
version: a1
width: 64 bits
clock: 33MHz
capabilities: pm msi pciexpress vga_controller bus_master cap_list rom
configuration: driver=nouveau latency=0
resources: irq:164 memory:a2000000-a2ffffff memory:90000000-9fffffff memory:a0000000-a1ffffff ioport:3000(size=128) memory:c0000-dffff
root@balena:/usr/app# nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
root@balena:/usr/app# lshw -C display
*-display
description: VGA compatible controller
product: GP106 [GeForce GTX 1060 6GB]
vendor: NVIDIA Corporation
physical id: 0
bus info: pci@0000:01:00.0
version: a1
width: 64 bits
clock: 33MHz
capabilities: pm msi pciexpress vga_controller bus_master cap_list rom
configuration: driver=nouveau latency=0
resources: irq:138 memory:de000000-deffffff memory:c0000000-cfffffff memory:d0000000-d1ffffff ioport:e000(size=128) memory:c0000-dffff
root@balena:/usr/app# nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
IMPORTANT NOTE: Intel NUC images are working well on ALL PCs, but genericx86-64 image has this issue…
Any idea?
Cheers!