We’re crossing a problem when trying to run neural network models inference in DeepStream with the Jetson AGX Orin.
The simple DeepStream pipelines to play & store the video stream are running fine, but as soon we add an inference node (NVIDIA DeepStream nvinfer gstreamer plugin) the execution fails with the following error:
23.02.23 16:25:10 (+0000) 2023-02-23T16:25:10.126541Z WARN subprocess::subprocess: Pipeline stopped during startup: PipelineStopped(GstError(GstError { domain: Resource, code: 1, description: “Failed to set buffer pool to active” }))
Additionally we can find those errors in the logs, just when nvinfer is present:
23.02.23 16:25:10 (+0000) nvbufsurftransform: Could not get EGL display connection
23.02.23 16:25:10 (+0000) nvbuf_utils: Could not get EGL display connection
23.02.23 16:25:10 (+0000) nvbuf_utils: Could not get EGL display connection
23.02.23 16:25:10 (+0000) nvbufsurface: Could not get EGL display connection
23.02.23 16:25:10 (+0000) nvbufsurface: Can’t get EGL display
Based on this NVIDIA documentation page we validated the DISPLAY is not set as environment variable on BalenaOS, neither in our container.
We want to use the Gateways as “headless” systems. The same approach was used to create our Balena Jetson Xavier NX image and we don’t even install X11 there, but the inference works fine.
Did some attempts with and without monitor attached to the Orin AGX, also tried to install the X11 in the container image but doesn’t solves the problem.
TensorRT and CUDA are working fine.
@rmpt did you try building this with NVIDIA Ubuntu setup? If it works, then you can take a look at what exact packages are installed in Ubuntu and make sure they use the same packages and versions in containers.
Currently we build the containers of our solution successfully to the following targets:
NVIDIA dGPU (amd64 build)
Jetson Xavier NX (native image - arm build)
Balena Jetson Xavier NX (arm build)
As there shouldn’t exist major differences (besides packages versions) in comparison to the Balena Jetson Xavier NX that is working fine, we suspect this might be some underlying issue of the BalenaOS integration with Orin AGX.
Some efforts we did so far without success:
tried to install dependencies by excess (all nvidia-l4t-* and even nvidia-jetpack metapackage)
build the nvinfer from source (present in /opt/nvidia/deepstream/deepstream/sources/gst-plugins/gst-nvinfer) and replaced the default shared library
And some validations:
the TensorRT engine of the model is successfully created and CUDA app runs inside the container.
when we run gst-inspect-1.0 nvinfer we are able to get output and the plugin is not blacklisted. The EGL display connection warning appears while it shouldn’t
So perform this gst-inspect-1.0 nvinfer command inside the container is also a simple check you can perform in your side, to figure if you get or not the EGL display connection errors in the output.
Currently the solution works for us in the following cases: Dockerfile | Base image nvidia-dgpu|FROM nvcr.io/nvidia/deepstream:6.1.1-triton (Ubuntu 20.04)
Are you suggesting that we should flash JetPack instead of BalenaOS and check if the solution works in the Orin AGX container based on the nvcr.io/nvidia/deepstream-l4t:6.1.1-triton image?
I was thinking of trying that setup directly in the Nvidia host, not in Nvidia container. So in the Nvidia Ubuntu directly, on the Orin NX, to see if the container ubuntu version is the problem or it’s related to package versions or the installation. If iit works in Nvidia ubuntu without any container, then the same setup and package versions need to be replicated in the balenaOS container.
I suspect this may have something to do with using ubuntu 18.04 in container on the Orin (which has packages built for t234, not 194 as the Xavier NX), but testing this directly in the Nvidia Ubuntu host should tell if this is the case.