DeepStream inference not working on Jetson AGX Orin

Hi Balena team,

We’re crossing a problem when trying to run neural network models inference in DeepStream with the Jetson AGX Orin.

The simple DeepStream pipelines to play & store the video stream are running fine, but as soon we add an inference node (NVIDIA DeepStream nvinfer gstreamer plugin) the execution fails with the following error:

23.02.23 16:25:10 (+0000) 2023-02-23T16:25:10.126541Z WARN subprocess::subprocess: Pipeline stopped during startup: PipelineStopped(GstError(GstError { domain: Resource, code: 1, description: “Failed to set buffer pool to active” }))

Additionally we can find those errors in the logs, just when nvinfer is present:

23.02.23 16:25:10 (+0000) nvbufsurftransform: Could not get EGL display connection
23.02.23 16:25:10 (+0000) nvbuf_utils: Could not get EGL display connection
23.02.23 16:25:10 (+0000) nvbuf_utils: Could not get EGL display connection
23.02.23 16:25:10 (+0000) nvbufsurface: Could not get EGL display connection
23.02.23 16:25:10 (+0000) nvbufsurface: Can’t get EGL display

Based on this NVIDIA documentation page we validated the DISPLAY is not set as environment variable on BalenaOS, neither in our container.
We want to use the Gateways as “headless” systems. The same approach was used to create our Balena Jetson Xavier NX image and we don’t even install X11 there, but the inference works fine.

Did some attempts with and without monitor attached to the Orin AGX, also tried to install the X11 in the container image but doesn’t solves the problem.
TensorRT and CUDA are working fine.

We started with your sample Dockerfile for Jetson AGX Orin Devkit and we additionally install the following dependencies for our product:

RUN install_packages
build-essential
cmake
curl
iproute2
gstreamer1.0-libav
gstreamer1.0-plugins-bad
gstreamer1.0-plugins-good
gstreamer1.0-tools
libgstreamer-plugins-base1.0-dev
libgstreamer1.0-0
libgstreamer1.0-dev
libgstrtspserver-1.0-0
libjansson4
libpython3.8
libpython3.8-dev
libssl1.1
python3-dev
python3-gst-1.0
python3-numpy
python3-pip
python3-requests
python3-setuptools
python3-wheel

RUN install_packages deepstream-6.1

1 Like

Hello @rmpt thank you for your e-mail!

Could you please share on a gist file (or similar) the Dockerfile that you are building so we can try to reproduce ourselves?

Thanks

@rmpt did you try building this with NVIDIA Ubuntu setup? If it works, then you can take a look at what exact packages are installed in Ubuntu and make sure they use the same packages and versions in containers.

Could you pplease confirm? Thanks!

Hi Marc @mpous

Thanks for the quick answer.

Could you please share on a gist file (or similar) the Dockerfile that you are building so we can try to reproduce ourselves?

# Cuda Examples can't be compiled with newer glibc, see
# https://forums.developer.nvidia.com/t/cuda-11-5-samples-throw-multiple-error-attribute-malloc-does-not-take-arguments/192750
FROM balenalib/jetson-agx-orin-devkit-ubuntu:focal

# Prevent apt-get prompting for input
ENV DEBIAN_FRONTEND noninteractive

# Download and install BSP binaries for L4T 35.1
RUN \
    apt-get update && apt-get install -y wget tar lbzip2 binutils xz-utils zstd && \
    cd /tmp/ && wget https://developer.nvidia.com/embedded/l4t/r35_release_v1.0/release/jetson_linux_r35.1.0_aarch64.tbz2 && \
    tar xf jetson_linux_r35.1.0_aarch64.tbz2 && \
    cd Linux_for_Tegra && \
    sed -i 's/config.tbz2\"/config.tbz2\" --exclude=etc\/hosts --exclude=etc\/hostname/g' apply_binaries.sh && \
    sed -i 's/install --owner=root --group=root \"${QEMU_BIN}\" \"${L4T_ROOTFS_DIR}\/usr\/bin\/\"/#install --owner=root --group=root \"${QEMU_BIN}\" \"${L4T_ROOTFS_DIR}\/usr\/bin\/\"/g' nv_tegra/nv-apply-debs.sh && \
    sed -i 's/chroot . \//  /g' nv_tegra/nv-apply-debs.sh && \
    ./apply_binaries.sh -r / --target-overlay && cd .. \
    rm -rf Linux_for_Tegra && \
    echo "/usr/lib/aarch64-linux-gnu/tegra" > /etc/ld.so.conf.d/nvidia-tegra.conf && ldconfig

## Install X and xfce
RUN \
  apt-get install -y --no-install-recommends \
  xserver-xorg-input-evdev \
  xinit \
  xfce4 \
  xfce4-terminal \
  x11-xserver-utils \
  dbus-x11 \
  xterm

ENV LD_LIBRARY_PATH=/usr/lib/aarch64-linux-gnu/tegra
ENV UDEV=1

# Prevent screen from turning off
RUN echo "#!/bin/bash" > /etc/X11/xinit/xserverrc \
  && echo "" >> /etc/X11/xinit/xserverrc \
  && echo 'exec /usr/bin/X -s 0 dpms' >> /etc/X11/xinit/xserverrc


RUN dpkg --configure -a

RUN install_packages \
    build-essential \
    cmake \
    curl \
    iproute2 \
    gstreamer1.0-libav \
    gstreamer1.0-plugins-bad \
    gstreamer1.0-plugins-good \
    gstreamer1.0-tools \
    libgstreamer-plugins-base1.0-dev \
    libgstreamer1.0-0 \
    libgstreamer1.0-dev \
    libgstrtspserver-1.0-0 \
    libjansson4 \
    libpython3.8 \
    libpython3.8-dev \
    libssl1.1 \
    python3-dev \
    python3-gst-1.0 \
    python3-numpy \
    python3-pip \
    python3-requests \
    python3-setuptools \
    python3-wheel

RUN install_packages deepstream-6.1

ENV GST_PLUGIN_SYSTEM_PATH=/usr/lib/aarch64-linux-gnu/gstreamer-1.0

## Optional: Sample CUDA Clock sample run in webterminal:
##  apt-get update && apt-get install nvidia-l4t-cuda nvidia-cuda cuda-samples-11-4 && cd /usr/local/cuda-11.4/samples/0_Simple/clock/ && make && ./clock
##  Output:
##   CUDA Clock sample
##   GPU Device 0: "Xavier" with compute capability 7.2
##
##   Average clocks/block = 3171.421875

# Start XFCE desktop

CMD ["startxfce4"]

Then to run a sample DeepStream app with model inference you can perform inside the container the following command:

deepstream-app -c /opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt

Please notice to run this DS sample you will need to have the display enabled because it outputs by default to screen.

NVIDIA deepstream-app docs reference

Currently we build the containers of our solution successfully to the following targets:

  • NVIDIA dGPU (amd64 build)
  • Jetson Xavier NX (native image - arm build)
  • Balena Jetson Xavier NX (arm build)

As there shouldn’t exist major differences (besides packages versions) in comparison to the Balena Jetson Xavier NX that is working fine, we suspect this might be some underlying issue of the BalenaOS integration with Orin AGX.

Some efforts we did so far without success:

  • tried to install dependencies by excess (all nvidia-l4t-* and even nvidia-jetpack metapackage)
  • build the nvinfer from source (present in /opt/nvidia/deepstream/deepstream/sources/gst-plugins/gst-nvinfer) and replaced the default shared library

And some validations:

  • the TensorRT engine of the model is successfully created and CUDA app runs inside the container.
  • when we run gst-inspect-1.0 nvinfer we are able to get output and the plugin is not blacklisted. The EGL display connection warning appears while it shouldn’t

So perform this gst-inspect-1.0 nvinfer command inside the container is also a simple check you can perform in your side, to figure if you get or not the EGL display connection errors in the output.

Hi @rmpt , so, does this work if you install NVidia Ubuntu and run the same test in it?

Hi @acostach

Currently the solution works for us in the following cases:
Dockerfile | Base image
nvidia-dgpu | FROM nvcr.io/nvidia/deepstream:6.1.1-triton (Ubuntu 20.04)

nvidia-jetson | FROM nvcr.io/nvidia/deepstream-l4t:6.1.1-triton (Ubuntu 20.04)

balena-jetson-xavier-nx | FROM balenalib/jetson-xavier-nx-devkit-ubuntu:bionic (Ubuntu 18.04)

Are you suggesting that we should flash JetPack instead of BalenaOS and check if the solution works in the Orin AGX container based on the nvcr.io/nvidia/deepstream-l4t:6.1.1-triton image?

I was thinking of trying that setup directly in the Nvidia host, not in Nvidia container. So in the Nvidia Ubuntu directly, on the Orin NX, to see if the container ubuntu version is the problem or it’s related to package versions or the installation. If iit works in Nvidia ubuntu without any container, then the same setup and package versions need to be replicated in the balenaOS container.

I suspect this may have something to do with using ubuntu 18.04 in container on the Orin (which has packages built for t234, not 194 as the Xavier NX), but testing this directly in the Nvidia Ubuntu host should tell if this is the case.