Enable GPU on Container

We have been using this approach to install nvidia drivers inside the containers - NVIDIA drivers on Intel Nuc

But current HOST OS version(balenaOS 2.50.1+rev1) doesn’t seem to support nvidia-driver-435 any more.

FROM balenalib/%%BALENA_MACHINE_NAME%%-ubuntu-python:3.6-bionic-build

ENV RESINOS_VERSION=2.50.1%2Brev1.prod
ENV DEBIAN_FRONTEND=noninteractive
ENV YOCTO_VERSION=5.2.10

RUN wget https://files.resin.io/images/intel-nuc/${RESINOS_VERSION}/kernel_modules_headers.tar.gz
RUN tar -xf kernel_modules_headers.tar.gz && rm -rf kernel_modules_headers.tar.gz
RUN mkdir -p /lib/modules/${YOCTO_VERSION}-yocto-standard
RUN mv ./kernel_modules_headers /lib/modules/${YOCTO_VERSION}-yocto-standard/build
RUN ln -s /lib64/ld-linux-x86-64.so.2 /lib/ld-linux-x86-64.so.2
RUN apt-get update && apt-get install -y apt-transport-https
RUN apt-get install -y nvidia-driver-435 libboost-all-dev

ENV UDEV=1
ENV INITSYSTEM on

CMD [ "sleep", "infinity"]

I can’t find /lib/modules/${YOCTO_VERSION}-yocto-standard/build directory though I moved the linux headers in my Dockerfile.
I even tried to mount it as it was read-only and tried to copy header files manually, but it says out of space!

Found these interesting links:


How can I upgrade my balena-supervisor from 11.4.10 to the version that supports io.balena.features.gpu? And maybe I have to update the hostOS from balenaOS 2.50.1+rev1 to the latest?

Cheers.

Hey there Shane,

Thanks for reporting this. We have been able to reproduce it and will into this.

Cheers

As for the gpu label, can you try the latest OS for your device type please.

Hey, @rahul-thakoor
You mean, Ubuntu 20.04(Focal)?

No i meant latest balenaOS version.

@rahul-thakoor

Seems 2.50.1 is the latest version for Intel NUC?
I am using a normal amd64 PC… Should I try with Microsoft Surface Go (NEW) image, which is 2.54.2?

Hi Shane, just reading through this thread. My colleagues above reproduced the failure as:

[main]     Reading state information...
[main]     E: Unable to locate package nvidia-driver-435

You are basing your image on Ubuntu bionic, as in:

FROM balenalib/%%BALENA_MACHINE_NAME%%-ubuntu-python:3.6-bionic-build

But I don’t see the nvidia-driver-435 appearing on the bionic default feeds according to packages.ubuntu.com. It does appear both in focal or in the bionic-updates feed.

However you mention this was working for you before, so I am confused.

Regarding the io.balena.features.gpu feature, unfortunately it is currently not working, see https://github.com/balena-io/balena-supervisor/issues/1449

Hey, @alexgg
Yeah, as I have mentioned above, it worked in the last year! lol

I am going to try focal on the latest balenaOS version.

Cheers,
Shane.

@alexgg

Same, cannot install nvidia-driver-435.

Do you guys have any idea why I cannot install now?

Hi Shane,

Can you please let us know what was the base image you using? Since I just did a test and I can install nvidia-driver-435 in balenalib/intel-nuc-ubuntu-python:3.6-bionic-build without any problem.

Just a notice that this package is available only for amd64 (https://packages.ubuntu.com/bionic-updates/nvidia-driver-435) so you will not be able to find it in the base images for other archs.

Hi, @nghiant2710

Could you share the detailed Dockerfile?

And which balenaOS version did you use?

Cheers,
Shane.

This has nothing to do with the balenaOS as the package is installed in the docker container.
You can try this simple Dockerfile and see if it works:

FROM balenalib/intel-nuc-ubuntu-python:3.6-bionic-build
RUN install_packages nvidia-driver-435

Looks like you are using balenalib/%%BALENA_MACHINE_NAME%%-ubuntu-python:3.6-bionic-build in your Dockerfile, can you please let us know what is your target device? Since it will not work if you’re pushing to a non-amd64 device.

I was using Miscrosoft Surface Go image as the Intel NUC image has slightly lower version - 2.50.1+rev1

Trying your simple Dockerfile now.

Will update you soon.

Thanks, mate!

Well, just finished installation, but nvidia-smi is not working though.

root@balena:/# nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

root@balena:/# lshw -C Display
  *-display UNCLAIMED       
       description: VGA compatible controller
       product: GM204 [GeForce GTX 970]
       vendor: NVIDIA Corporation
       physical id: 0
       bus info: pci@0000:01:00.0
       version: a1
       width: 64 bits
       clock: 33MHz
       capabilities: pm msi pciexpress vga_controller bus_master cap_list
       configuration: latency=0
       resources: memory:a2000000-a2ffffff memory:90000000-9fffffff memory:a0000000-a1ffffff ioport:3000(size=128) memory:c0000-dffff
root@balena:/# uname -a
Linux balena 5.2.10-yocto-standard #1 SMP PREEMPT Sun Aug 16 13:57:45 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

As you could see, GTX970 is connected to my device.

Hi,

Have you added the io.balena.features.gpu to your compose file? Please add it if it’s not there.

I also see there are some newer driver packages which are nvidia-driver-440 and nvidia-driver-450 so can you give them a try and let us know how it goes?

Mate, io.balena.features.gpu is not working at the moment as @alexgg had mentioned here - Enable GPU on Container

And other versions are also not working though.

Hey, guys.

We are planning to do some Machine Learning & Computer Vision stuffs on the balenaOS by using Nvidia GPUs and Jetson boards(TX2/nano/Xavier). But none of them are working at the moment and we can’t do anything now.
Is there any hope that we can use GPU feature on balenaOS?
@alexgg
You could see my comment here - https://github.com/balena-os/balena-jetson/issues/57#issuecomment-699830289

Hi, although the gpu supervisor label is not currently working, you could try to run your container manually with:

balena run -it --gpus all <command>

If that works, then once the issue linked above is fixed you should be all set up.

Thanks, @alexgg

How about the AMD64 GPU issue?