NVIDIA drivers on Intel Nuc

Hi,

You haven’t added an apt-get update in the Dockerfile to populate the apt-cache.

And afterwards, the exact package name isn’t nvidia-driver I think.

root@b3789a28b8f9:/# apt-cache search nvidia-driver
nvidia-304 - NVIDIA legacy binary driver - version 304.135
nvidia-304-updates - Transitional package for nvidia-304
nvidia-340 - NVIDIA binary driver - version 340.107
nvidia-361 - Transitional package for nvidia-367
nvidia-384 - NVIDIA binary driver - version 384.130
root@b3789a28b8f9:/# 

I think you’ll need to install a specific driver. e.g. nvidia-384

Yes, that works!
But sadly, nvidia-smi doesn’t detect my board…

root@balena:/# lshw -C display
  *-display UNCLAIMED       
       description: VGA compatible controller
       product: GM204 [GeForce GTX 970]
       vendor: NVIDIA Corporation
       physical id: 0
       bus info: pci@0000:01:00.0
       version: a1
       width: 64 bits
       clock: 33MHz
       capabilities: pm msi pciexpress vga_controller bus_master cap_list
       configuration: latency=0
       resources: memory:a2000000-a2ffffff memory:90000000-9fffffff memory:a0000000-a1ffffff ioport:3000(size=128) memory:c0000-dffff
root@balena:/# nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

Not sure what’s wrong?
Here is my full Dockerfile.template file:

FROM balenalib/%%BALENA_MACHINE_NAME%%-ubuntu-python:3.6-bionic-build

ENV RESINOS_VERSION=2.29.0%2Brev1.prod

ENV YOCTO_VERSION=4.12.12
RUN wget https://files.resin.io/images/intel-nuc/${RESINOS_VERSION}/kernel_modules_headers.tar.gz
RUN tar -xf kernel_modules_headers.tar.gz && rm -rf kernel_modules_headers.tar.gz
RUN mkdir -p /lib/modules/${YOCTO_VERSION}-yocto-standard
RUN mv ./kernel_modules_headers /lib/modules/${YOCTO_VERSION}-yocto-standard/build
RUN ln -s /lib64/ld-linux-x86-64.so.2 /lib/ld-linux-x86-64.so.2
RUN apt-get update && apt-cache search nvidia-driver
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get install -y nvidia-driver-390

# Enable udevd so that plugged dynamic hardware devices show up in our container.
ENV UDEV=1

ENV INITSYSTEM on

CMD [ "sleep", "infinity"]

@scarlyon, I don’t have a solution but I can ask some questions that may help us help you:

  • What is the output of the following command on the host OS prompt of your NUC device?
    uname -a && cat /etc/issue && lsmod

  • What is the build output for the Dockerfile? (E.g. output of the "balena push" or "git push" command.) Could there have been any errors in building the drivers?

If the output is too large to paste in the body of message, maybe it could be attached as zip or created in a gist.github.com page. Just some suggestions.

root@balena:~# uname -a && cat /etc/issue && lsmod
Linux balena 5.2.10-yocto-standard #1 SMP PREEMPT Fri Oct 4 11:58:01 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
balenaOS 2.44.0 \n \l

Module                  Size  Used by
ip6table_filter        16384  0
ip6_tables             28672  1 ip6table_filter
xt_MASQUERADE          16384  3
nf_conntrack_netlink    32768  0
nfnetlink              16384  2 nf_conntrack_netlink
xfrm_user              36864  1
br_netfilter           24576  0
xt_owner               16384  0
cfg80211              598016  0
snd_hda_codec_hdmi     49152  1
wmi_bmof               16384  0
x86_pkg_temp_thermal    16384  0
coretemp               16384  0
snd_hda_codec_realtek    90112  1
snd_hda_codec_generic    65536  1 snd_hda_codec_realtek
iTCO_wdt               16384  0
watchdog               20480  1 iTCO_wdt
snd_hda_intel          32768  0
snd_hda_codec          94208  4 snd_hda_codec_generic,snd_hda_codec_hdmi,snd_hda_intel,snd_hda_codec_realtek
efivars                20480  0
snd_hda_core           65536  5 snd_hda_codec_generic,snd_hda_codec_hdmi,snd_hda_intel,snd_hda_codec,snd_hda_codec_realtek
snd_pcm                81920  4 snd_hda_codec_hdmi,snd_hda_intel,snd_hda_codec,snd_hda_core
snd_timer              28672  1 snd_pcm
wmi                    20480  1 wmi_bmof
video                  36864  0
backlight              16384  1 video
nls_cp437              20480  1
vfat                   24576  1
fat                    69632  1 vfat
sch_fq_codel           20480  2

Installation log: https://pastebin.com/F4uxGR4X

Here is my docker-compose.yml file:

version: '2'
volumes:
    resin-data:
services:
  gpu:
    build: ./gpu
    volumes:
      - 'resin-data:/data'
#      - 'nvidia_driver_390:/usr/local/nvidia:ro'
    restart: always
    privileged: true
    network_mode: host
    labels:
      io.balena.features.supervisor-api: '1'
    cap_add:
      - SYS_RAWIO
    devices:
      - "/dev:/dev"
      - /dev/nvidia0
      - /dev/nvidia1
      - /dev/nvidiactl
      - /dev/nvidia-uvm
      - /dev/nvidia-uvm-tools

Thanks! @pdcastro

Hi Shane. One thing I noticed from your output here is that you are running the container on balenaOS 2.44, but building the kernel module for balenaOS 2.29. If i recall, there were some major kernel bumps from 4.18 to 5.2 between those two OS versions, so it might be worth making sure your Dockerfile targets the 2.44 version.

1 Like

@shaunmulligan
I used 2.29 cause I couldn’t find the link of the linux header file.

Could you give me the full url of that file?

This seems broken for 2.44?
https://files.resin.io/images/intel-nuc/${RESINOS_VERSION}/kernel_modules_headers.tar.gz

I think it should just be:

https://files.resin.io/images/intel-nuc/2.44.0%2Brev1.prod/kernel_modules_headers.tar.gz

that seems to work for me, at least from my browser. Haven’t tested it with the OOT kernel modules script though.

Odd, your link was not working yesterday, but working now… :slight_smile:

But unfortunately, nvidia-smi is still not working… :confused:
@shaunmulligan

Hey Shane,
Can you also confirm you 're setting the right YOCTO_VERSION envvar ? From the above shared output, that should be set to 5.2.10. Could you share the relevant section of your Dockerfile?
Thanks!

@mikesimos
Here is the current configuration:

FROM balenalib/%%BALENA_MACHINE_NAME%%-ubuntu-python:3.6-bionic-build

ENV RESINOS_VERSION=2.44.0%2Brev1.prod

ENV YOCTO_VERSION=5.2.10
RUN wget https://files.resin.io/images/intel-nuc/${RESINOS_VERSION}/kernel_modules_headers.tar.gz
RUN tar -xf kernel_modules_headers.tar.gz && rm -rf kernel_modules_headers.tar.gz
RUN mkdir -p /lib/modules/${YOCTO_VERSION}-yocto-standard
RUN mv ./kernel_modules_headers /lib/modules/${YOCTO_VERSION}-yocto-standard/build
RUN ln -s /lib64/ld-linux-x86-64.so.2 /lib/ld-linux-x86-64.so.2
RUN apt-get update && apt-cache search nvidia-driver
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get install -y nvidia-driver-390

# Enable udevd so that plugged dynamic hardware devices show up in our container.
ENV UDEV=1

ENV INITSYSTEM on

CMD [ "sleep", "infinity"]

Cheers.

Hi there,

I’ve noticed that there’s a failure whilst building modules with that Dockerfile:

[main]     DKMS: install completed.
[main]     Building initial module for 5.2.10-yocto-standard
[main]     Error! Bad return status for module build on kernel: 5.2.10-yocto-standard (x86_64)
[main]     Consult /var/lib/dkms/nvidia/390.116/build/make.log for more information.

Looking at the build logs, there’s a problem with redefinition of a function in one of the headers. It looks like 390 was actually pretty broken, and I tracked the issue down to this: https://devtalk.nvidia.com/default/topic/1054883/linux/nvidia-390-116-cannot-build-modules-on-fedara-30-kernel-5-1-5-300-fc30-x86_64/

Can you try with RUN apt-get install -y nvidia-driver-435 instead of RUN apt-get install -y nvidia-driver-390 as this seems to build cleanly, and let us know how it goes?

Best regards,

Heds

1 Like

As an aside, could you let us know the exact hardware this is running on please?

Thanks,

Heds

Yeah, 435 works!!!

Thanks for your help!

Really appreciated.

We are glad to hear that it worked for you.
Let us know if you need any further assistance.

@scarlyon, Shane, what is the exact hardware you are using? We were using Zotac mini pc with Nvidia GPU but didn’t try this build path.

@thgreasi @headss @shaunmulligan @pdcastro @zubairlk

I cannot install nvidia driver on the latest balenaOS now…

Something is changed?