Running CUDA service(s)

hgaiser · August 10, 2021, 1:49pm

I’m trying to run a service based on CUDA with balenaOS (on a Generic x86_64 device), but I’m running into some issues. Based on this blog I see that there is a scheduled feature for allowing “hostapp extensions” for layering additional software, like NVIDIA drivers, on top of the balenaOS host OS. Based on this comment from @alexgg it seems that the balena team is still actively working on this and that there is no release date planned yet.

From my understanding I would need to add hostapp extensions at least for the NVIDIA driver and the NVIDIA Container Toolkit. This should allow me to pass through my GPU devices to my services. I am unsure if I need a CUDA hostapp extension as well, or if it is sufficient to install CUDA within the service itself. Is my understanding of hostapp extensions correct here?
Is this already possible? Even if it is using a beta or experimental version of balenaOS, I would love to give this a shot.
If this is not possible (yet), is there a known workaround to get CUDA services to work (on Generic x86_64 devices) on balenaOS?
With qemu I can passthrough the GPU to the virtual machine using IOMMU, allowing for raw access but prohibiting access to the GPU from the host OS. It would mean limiting to only a single service, since it will “own” that GPU, but it will work for my case. I found it difficult to find resources online for something like this. Is such a thing possible with Docker (and therefore balenaOS)?

dtischler · August 28, 2021, 12:40am

Hello @hgaiser, your summary and analysis is correct, the “Hostapp Extensions” are a feature that we are still working on, but are not ready yet. And indeed, they will allow you to extend the OS to include added features, in this case the Nvidia runtime components.

In the meantime however, there is no turnkey way to accomplish the passthrough, that I am aware of.

I have pinged a colleague to have a look, perhaps he has some ideas, but I can’t say that’s a guarantee.

hgaiser · August 30, 2021, 9:12am

Hey @dtischler . I’ve been in email contact with a colleague of yours, @joehounsham . He shared with me a method to compile the nvidia kernel module for the Linux kernel running in the host OS. Hopefully that will allow me to load the driver in the container, while I’m waiting for the hostapp extensions to be completed.

I’m not sure if I’m allowed to share the method @joehounsham provided, but I would be happy to :).

Regardless, thank you for your response.

dtischler · September 4, 2021, 1:10am

@hgaiser Oh wow, great, glad to hear you have a working solution then! I’ll follow up with Joe and see if we can publish a sample GitHub repo for this. Thanks!

Topic		Replies	Views
Enable GPU on Container balenaOS docker , nvidia	79	4998	January 20, 2021
Support x86_64. Video analyze services (Not IoT) openBalena	43	1563	July 20, 2020
Support for Docker 19.03 balenaEngine	15	977	May 15, 2020
Install CuDNN on Jetson TX2 balenaOS jetson	14	3569	July 22, 2021
Support for nvidia runtime on generic x86_64 systems Product support nvidia	3	460	August 6, 2021

Running CUDA service(s)

Related topics