generic-amd64 Nvidia GPU container kernel-modules-header not found

Hi,

I’m trying to use the gpu container for accessing Nvidia hardware on a generic X86 computer. I’m following this example ( Using an NVIDIA GPU on x86 Devices with balenaOS ), where I update the gpu Dockerfile to match the correct BalenaOS version, Yocto version, machine_name and nvidia driver version.

When building the dockerfile on my machine, the build fails with this error:

 > [ 5/12] RUN     curl -fsSL "https://files.balena-cloud.com/images/generic-amd64/6.6.10/kernel-modules-headers.tar.gz"         | tar xz --strip-components=2 &&     make -C build modules_prepare -j"$(nproc)":                                                                                                                                                 
0.666 curl: (22) The requested URL returned error: 404                                                                                                                           
0.669 
0.669 gzip: stdin: unexpected end of file
0.669 tar: Child returned status 1
0.669 tar: Error is not recoverable: exiting now

The kernel-modules-headers files cannot be found at the files.balena-cloud.com endpoint. Did the endpoint change? Where can I find the correct files?

Dockerfile:

FROM balenalib/genericx86-64-ext:bullseye-run-20211030

WORKDIR /usr/src

ENV DEBIAN_FRONTEND noninteractive

# Set some variables to download the proper header modules
ENV VERSION="6.6.10"
ENV BALENA_MACHINE_NAME="generic-amd64"

# Set variables for the Yocto version of the OS
ENV YOCTO_VERSION=6.12.36
ENV YOCTO_KERNEL=${YOCTO_VERSION}-yocto-standard

# Set variables to download proper NVIDIA driver
ENV NVIDIA_DRIVER_VERSION=580.95
ENV NVIDIA_DRIVER=NVIDIA-Linux-x86_64-${NVIDIA_DRIVER_VERSION}

# Install some prereqs
RUN install_packages git wget unzip build-essential libelf-dev bc libssl-dev bison flex software-properties-common

WORKDIR /usr/src/kernel_source

# Causes a pipeline to produce a failure return code if any command errors.
# Normally, pipelines only return a failure if the last command errors.
SHELL ["/bin/bash", "-o", "pipefail", "-c"]

# Download the kernel source then prepare kernel source to build a module.
RUN \
    curl -fsSL "https://files.balena-cloud.com/images/${BALENA_MACHINE_NAME}/${VERSION}/kernel-modules-headers.tar.gz" \
        | tar xz --strip-components=2 && \
    make -C build modules_prepare -j"$(nproc)"

# required if using install-libglvnd from nvidia-installer below
RUN install_packages libglvnd-dev

WORKDIR /usr/src/nvidia

# Download and compile NVIDIA driver
RUN \
    curl -fsSL -O https://us.download.nvidia.com/XFree86/Linux-x86_64/$NVIDIA_DRIVER_VERSION/$NVIDIA_DRIVER.run && \
    chmod +x ./${NVIDIA_DRIVER}.run && \
    ./${NVIDIA_DRIVER}.run --extract-only && \
    # Install userspace portion, needed if container will also have CUDA etc...
    # Not needed if just building kernel module.
    # Do include in any application container.
    ./${NVIDIA_DRIVER}/nvidia-installer \
    --ui=none \
    --no-questions \
    --no-drm \
    --no-x-check \
    --no-systemd \
    --no-kernel-module \
    --no-distro-scripts \
    --install-compat32-libs \
    --no-nouveau-check \
    --no-rpms \
    --no-backup \
    --no-abi-note \
    --no-check-for-alternate-installs \
    --no-libglx-indirect \
    --install-libglvnd \
    --x-prefix=/tmp/null \
    --x-module-path=/tmp/null \
    --x-library-path=/tmp/null \
    --x-sysconfig-path=/tmp/null \
    --kernel-name=${YOCTO_KERNEL} \
    --skip-depmod \
    --expert && \
    make -C ${NVIDIA_DRIVER}/kernel KERNEL_MODLIB=/usr/src/kernel_source IGNORE_CC_MISMATCH=1 modules

WORKDIR /nvidia/driver

RUN find /usr/src/nvidia/${NVIDIA_DRIVER}/kernel -name "*.ko" -exec mv {} . \;

WORKDIR /usr/src/app
COPY *.sh ./

ENTRYPOINT ["/bin/bash", "/usr/src/app/entry.sh"]

Kind regards

Ryan

When trying to figure out the wrong endpoint, I stumbled on this post: ( Path to generic x86-64 kernel headers - #13 by nilsdebruin ). In this post I noticed that the filename changed. In the example on the github repo the file is called “kernel-modules-headers.tar.gz”. But for some reason the filename changed to “kernel_modules_headers.tar.gz". The dash (-) changed to underscore (_).

If you have the same problem to find the kernel-modules-headers files change to this in the docker file:

# Download the kernel source then prepare kernel source to build a module.
RUN \
    curl -fsSL "https://files.balena-cloud.com/images/${BALENA_MACHINE_NAME}/${VERSION}/kernel_modules_headers.tar.gz" \
        | tar xz --strip-components=2 && \
    make -C build modules_prepare -j"$(nproc)"

1 Like