Hello,
We just found out about BalenaOS and we wanted to test it on our client devices (coming from NixOS). Our hardware devices are :
CPU : Intel Core i7-14700K
GPU : NVIDIA GeForce RTX 3080 Ti
I saw this first blog post about working with hardware drivers but it seems to be out of date since BalenaOS 3.0
. I then found this second blog post about building out of tree linux kernel modules and tried to make it work with alexgg/nvidia
branch.
I only changed in the docker-compose.yml
file the OS_VERSION
to current one (5.3.27+rev1
) and after a minor issue (missing bc
package), I ran into a more complexe one :
[Build] ERROR: Unable to load the kernel module 'nvidia.ko'. This happens most frequently when this kernel module was built against the wrong or improperly configured kernel sources, with a version of gcc that differs from the one used to build the target kernel, or if another driver, such as nouveau, is present and prevents the NVIDIA kernel module from obtaining ownership of the NVIDIA device(s), or no NVIDIA device installed in this system is supported by this NVIDIA Linux graphics driver release.
[Build] Please see the log entries 'Kernel module load error' and 'Kernel messages' at the end of the file '/var/log/nvidia-installer.log' for more information.
[Build]
[Build] [load]
[Build] ERROR: Installation has failed. Please see the file '/var/log/nvidia-installer.log' for details. You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.
From there, we tried some things :
- changing the nvidia drivers versions
- connecting via SSH to run
rmmod nvidiafb && rmmod nouveau
, and only then performing thebalena push
- checking GCC versions (running
gcc --version
in the dockerfile andcat /proc/version
on the OS output 11.4.0) - disabling secure boot
Adding --skip-module-load
to ./${nvidia_installer} --silent --kernel-source-path "${headers_dir}"
allows us to keep going through the installer but it end with this error loop :
[Logs] [2024-07-09T15:45:28.722Z] Restarting service 'load sha256:c98fdcc73543b58a2e05fce6efa9dcfe435af71cf32b66c9b77807ca0d388fa8'
[Logs] [2024-07-09T15:45:28.688Z] [load] OS Version is 5.3.27+rev1
[Logs] [2024-07-09T15:45:28.688Z] [load] Loading module from /opt/lib/modules/5.3.27+rev1/nvidia-drm.ko
[Logs] [2024-07-09T15:45:28.705Z] [load] insmod: can't insert '/opt/lib/modules/5.3.27+rev1/nvidia-drm.ko': unknown symbol in module, or unknown parameter
[Logs] [2024-07-09T15:45:28.707Z] [load] Loading module from /opt/lib/modules/5.3.27+rev1/nvidia-modeset.ko
[Logs] [2024-07-09T15:45:28.729Z] [load] insmod: can't insert '/opt/lib/modules/5.3.27+rev1/nvidia-modeset.ko': unknown symbol in module, or unknown parameter
[Logs] [2024-07-09T15:45:28.733Z] [load] Loading module from /opt/lib/modules/5.3.27+rev1/nvidia-peermem.ko
[Logs] [2024-07-09T15:45:28.773Z] [load] insmod: can't insert '/opt/lib/modules/5.3.27+rev1/nvidia-peermem.ko': Invalid argument
[Logs] [2024-07-09T15:45:28.777Z] [load] Loading module from /opt/lib/modules/5.3.27+rev1/nvidia-uvm.ko
[Logs] [2024-07-09T15:45:28.800Z] [load] insmod: can't insert '/opt/lib/modules/5.3.27+rev1/nvidia-uvm.ko': unknown symbol in module, or unknown parameter
[Logs] [2024-07-09T15:45:28.804Z] [load] Loading module from /opt/lib/modules/5.3.27+rev1/nvidia.ko
[Logs] [2024-07-09T15:45:29.047Z] Service exited 'load sha256:c98fdcc73543b58a2e05fce6efa9dcfe435af71cf32b66c9b77807ca0d388fa8'
[Logs] [2024-07-09T15:45:28.962Z] [load] insmod: can't insert '/opt/lib/modules/5.3.27+rev1/nvidia.ko': No such device
Could you kindly help us troubleshoot the issue?
Thanks in advance!