Hardware video encoders intermittently not starting on Jetson TX2 devkit

I’m running a TX2 devkit with balenaOS 2.56.0+rev4 and the 11.14.0 supervisor.

Under certain (unclear which, as I have not found any consistency yet) circumstances the omx and nvv4l2 video encoders used within gstreamer don’t output any video data. The gstreamer output is non-conclusive, but dmesg throws a lot of messages.

Sometimes this situation can be fixed by a reboot of the device or by killing the power and starting the device again.

The exact same video pipeline works reliably when running it on a NVIDIA Jetpack installed OS and I have replicated the issue with different Jetson TX2 devices (devkit and Aetina n510 carrier board).


Setup to reproduce

This can be replicated with this dockerfile by @acostach

I’m using the following docker-compose.yml to balena deploy it:

# This docker-compose is to be deployed to the 'Development' application on balena.
# In this docker-compose we use images from non-master branches. When these branches
# have been reviewed and merged, you may move the contents of this compose to the one
# under ReleaseCandidate, which should deploy the exact same containers, but built from
# the master branch.
version: '2.1'

services:
  dcd:
    image: tx2-sample-apps-gstreamer
    build:
      context: ./
      dockerfile: Dockerfile.gstreamer
    labels:
      # io.balena.features.sysfs: '0' apparently also not required
      # io.balena.features.firmware: '1' apparently not required
    environment:
      - UDEV=on
    # devices:
    #   - /dev:/dev # apparently not required
    privileged: true
    restart: "unless-stopped"
    command: ["tail","-f", "/dev/null"] # for debugging

As one can see above I experimented with the different labels and mounting of devices, but didn’t see any change.

On the host I set nvpmodel -m 0 to enable all cores. nvpmodel -q confirms this.


Gstreamer pipeline

Once the application is deployed I run a test gstreamer pipeline:

GST_DEBUG=3 gst-launch-1.0 videotestsrc num-buffers=10 ! videoconvert ! omxh265enc ! queue  ! fakesink dump=true

or alternatively for nvv4l2h265enc

GST_DEBUG=3 gst-launch-1.0 videotestsrc num-buffers=10 ! nvvidconv ! nvv4l2h265enc ! fakesink dump=true

Expected behaviour

When this is working I might get a few warnings from the plugins scanner that should be unrelated, but then the fakesink will dump buffers:

Expected output
0:00:00.032805881 e[336m28727e[00m   0x5567fbb190 e[33;01mWARN   e[00m e[00m                 omx gstomx.c:2865:plugin_init:e[00m Failed to load configuration file: Valid key file could not be found in search dirs (searched in: /home/aivero/.config:/etc/xdg as per GST_OMX_CONFIG_DIR environment variable, the xdg user config directory (or XDG_CONFIG_HOME) and the system config directory (or XDG_CONFIG_DIRS)
nvbuf_utils: Could not get EGL display connection
nvbuf_utils: ERROR getting proc addr of eglCreateImageKHR
nvbuf_utils: ERROR getting proc addr of eglDestroyImageKHR
Setting pipeline to PAUSED ...
Pipeline is PREROLLING ...
0:00:00.047061443 e[336m28727e[00m   0x5567fb86d0 e[32;01mFIXME  e[00m e[00;04m             default gstreamer-1.16.2/gst/gstutils.c:3981:gst_pad_create_stream_id_internal:<videotestsrc0:src>e[00m Creating random stream-id, consider implementing a deterministic way of creating a stream-id
0:00:00.051051833 e[336m28727e[00m   0x5567fb86d0 e[32;01mFIXME  e[00m e[00m        videoencoder gst-plugins-base-1.16.2/gst-libs/gst/video/gstvideoencoder.c:668:gst_video_encoder_setcaps:<omxh265enc-omxh265enc0>e[00m GstVideoEncoder::reset() is deprecated
Framerate set to : 30 at NvxVideoEncoderSetParameterNvMMLiteOpen : Block : BlockType = 8 
===== NVMEDIA: NVENC =====
NvMMLiteBlockCreate : Block : BlockType = 8 
0:00:00.054749826 e[336m28727e[00m   0x5567fb86d0 e[33;01mWARN   e[00m e[00m         omxvideoenc gstomxvideoenc.c:1860:gst_omx_video_enc_set_format:<omxh265enc-omxh265enc0>e[00m Error setting temporal_tradeoff 0 : Vendor specific error (0x00000001)
NVMEDIA: H265 : Profile : 1 
Pipeline is PREROLLED ...
Setting pipeline to PLAYING ...
00000000 (0x7f8c00b180): 00 00 14 b6 26 01 af 08 e0 bc ea 1a ff bd 2c 2f  ....&.........,/
00000010 (0x7f8c00b190): 0c ae 58 5c c8 1e 4d ce 8c 1e 55 ab 44 1d 13 83  ..X\..M...U.D...
00000020 (0x7f8c00b1a0): 5d ec 63 f8 7d 99 c2 b2 27 f7 aa ec bd 87 1e 7e  ].c.}...'......~
00000030 (0x7f8c00b1b0): 25 6f e6 7a 32 7a 87 f8 3b 9d df 2e 12 16 ca 1c  %o.z2z..;.......
New clock: GstSystemClock
00000040 (0x7f8c00b1c0): cc 44 1f 19 bd d6 cc 6a 56 8c 40 68 80 5e 5a 13  .D.....jV.@h.^Z.
00000050 (0x7f8c00b1d0): 10 f7 83 48 b9 42 21 06 db 8e 16 11 d1 65 b6 9b  ...H.B!......e..
.....
[Truncated]
.....
00001490 (0x7f8c00c610): 7e 41 fb 5b 68 93 90 fb 52 4d d3 91 75 96 c2 dc  ~A.[h...RM..u...
000014a0 (0x7f8c00c620): 7d 57 94 49 b9 b1 99 0b 78 2b c3 68 3e bd 38 37  }W.I....x+.h>.87
000014b0 (0x7f8c00c630): 90 b2 2c 36 88 c0 54 d6 e5 c6                    ..,6..T...      
Got EOS from element "pipeline0".
Execution ended after 0:00:00.004696086
Setting pipeline to PAUSED ...
Setting pipeline to READY ...
Setting pipeline to NULL ...
Freeing pipeline ...

There is no output on dmesg -w during the same time.


Actual behaviour

The pipeline loads correctly but never pre-rolls:

gst-launch-1.0 videotestsrc num-buffers=10 ! videoconvert ! omxh265enc ! queue  ! fakesink dump=true
nvbuf_utils: Could not get EGL display connection
Setting pipeline to PAUSED ...
Pipeline is PREROLLING ...
Framerate set to : 30 at NvxVideoEncoderSetParameterNvMMLiteOpen : Block : BlockType = 8 
===== NVMEDIA: NVENC =====
NvMMLiteBlockCreate : Block : BlockType = 8 
NVMEDIA: H265 : Profile : 1 
^Chandling interrupt.
Interrupt: Stopping pipeline ...
ERROR: pipeline doesn't want to preroll.
Setting pipeline to NULL ...
^C
with GST_DEBUG=3
GST_DEBUG=3 gst-launch-1.0 videotestsrc num-buffers=10 ! videoconvert ! omxh265enc ! queue  ! fakesink dump=true
0:00:01.035686694   400   0x558440ed90 WARN                  ladspa gstladspa.c:507:plugin_init:<plugin185> no LADSPA plugins found, check LADSPA_PATH
0:00:01.091416779   400   0x558440ed90 WARN                 default gstsf.c:98:gst_sf_create_audio_template_caps: format 0x120000: 'AVR (Audio Visual Research)' is not mapped
0:00:01.091520683   400   0x558440ed90 WARN                 default gstsf.c:98:gst_sf_create_audio_template_caps: format 0x180000: 'CAF (Apple Core Audio File)' is not mapped
0:00:01.091563338   400   0x558440ed90 WARN                 default gstsf.c:98:gst_sf_create_audio_template_caps: format 0x100000: 'HTK (HMM Tool Kit)' is not mapped
0:00:01.091624778   400   0x558440ed90 WARN                 default gstsf.c:98:gst_sf_create_audio_template_caps: format 0xc0000: 'MAT4 (GNU Octave 2.0 / Matlab 4.2)' is not mapped
0:00:01.091662601   400   0x558440ed90 WARN                 default gstsf.c:98:gst_sf_create_audio_template_caps: format 0xd0000: 'MAT5 (GNU Octave 2.1 / Matlab 5.0)' is not mapped
0:00:01.091694953   400   0x558440ed90 WARN                 default gstsf.c:98:gst_sf_create_audio_template_caps: format 0x210000: 'MPC (Akai MPC 2k)' is not mapped
0:00:01.091732105   400   0x558440ed90 WARN                 default gstsf.c:98:gst_sf_create_audio_template_caps: format 0xe0000: 'PVF (Portable Voice Format)' is not mapped
0:00:01.091766953   400   0x558440ed90 WARN                 default gstsf.c:98:gst_sf_create_audio_template_caps: format 0x160000: 'SD2 (Sound Designer II)' is not mapped
0:00:01.091807880   400   0x558440ed90 WARN                 default gstsf.c:98:gst_sf_create_audio_template_caps: format 0x190000: 'WVE (Psion Series 3)' is not mapped
nvbuf_utils: Could not get EGL display connection
nvbufsurftransform: Could not get EGL display connection
nvbuf_utils: Could not get EGL display connection
nvbufsurftransform: Could not get EGL display connection
0:00:02.066874767   399   0x55936b9210 ERROR     GST_PLUGIN_LOADING gstpluginloader.c:277:plugin_loader_replay_pending: Plugin file /usr/lib/aarch64-linux-gnu/gstreamer-1.0/libgstnvvideoconvert.so failed to load. Blacklisting
nvbuf_utils: Could not get EGL display connection
0:00:00.041464343   402   0x55a5ae8360 WARN                     omx gstomx.c:2826:plugin_init: Failed to load configuration file: Valid key file could not be found in search dirs (searched in: /root/.config:/etc/xdg as per GST_OMX_CONFIG_DIR environment variable, the xdg user config directory (or XDG_CONFIG_HOME) and the system config directory (or XDG_CONFIG_DIRS)
nvbuf_utils: Could not get EGL display connection
0:00:02.881695854   399   0x55936b9210 WARN                     omx gstomx.c:2826:plugin_init: Failed to load configuration file: Valid key file could not be found in search dirs (searched in: /root/.config:/etc/xdg as per GST_OMX_CONFIG_DIR environment variable, the xdg user config directory (or XDG_CONFIG_HOME) and the system config directory (or XDG_CONFIG_DIRS)
Setting pipeline to PAUSED ...
Pipeline is PREROLLING ...
0:00:02.904396761   399   0x55938c8720 FIXME                default gstutils.c:3981:gst_pad_create_stream_id_internal:<videotestsrc0:src> Creating random stream-id, consider implementing a deterministic way of creating a stream-id
0:00:02.925574737   399   0x55938c8720 FIXME           videoencoder gstvideoencoder.c:661:gst_video_encoder_setcaps:<omxh265enc-omxh265enc0> GstVideoEncoder::reset() is deprecated
Framerate set to : 30 at NvxVideoEncoderSetParameterNvMMLiteOpen : Block : BlockType = 8 
===== NVMEDIA: NVENC =====
NvMMLiteBlockCreate : Block : BlockType = 8 
0:00:02.929859407   399   0x55938c8720 WARN             omxvideoenc gstomxvideoenc.c:1860:gst_omx_video_enc_set_format:<omxh265enc-omxh265enc0> Error setting temporal_tradeoff 0 : Vendor specific error (0x00000001)
NVMEDIA: H265 : Profile : 1 
^Chandling interrupt.
Interrupt: Stopping pipeline ...
ERROR: pipeline doesn't want to preroll.
Setting pipeline to NULL ...
^C

Simultaneously, I get the following output from dmesg -w:
dmesg_output.log (43.1 KB)

I’m a little lost on how to solve this and I was wondering if there is any form of permission missing or not being applied correctly when spinning up these containers.

I’ve granted support access for UUID: 4d1be00b9ff8da2e879164f79135ae2b

Thank you for any ideas :slight_smile:

Hi @rapha the dockerfile you linked uses the old l4t 28.2.1 BSP archive in the container, the BSP version is visible here, and the hostOS for 2.56.0 uses l4t-r32.4.2

Please switch to using a BSP archive that is for l4t 32.4.2, which corresponds to the 2.56.0+rev4 image. You can pin a container to match the L4T in the OS by using container contracts, here’s an example: https://github.com/balena-io-examples/tx2-container-contracts-sample

Also, the tx2 base images now contain the nvidia sources.list and you can simply apt-get install cuda and other related packages hosted by nvidia.

You can check if the issue still persists with https://github.com/balena-io-examples/tx2-container-contracts-sample/blob/master/tx2_32_4_2/Dockerfile.template

1 Like

Thanks for the quick response @acostach - I noticed the old BSP version before, funnily it worked nonetheless when I first tried it, until it didn’t any more. This is already a boiled down example, normally we use our own builds/package of nvidia’s drivers that match the BSP.

Good to know they can be installed in the base image. In the tx2-container-contracts-sample you are directly running apply_binaries.sh I take it that the drivers that that installs are not apt installable?

Thanks for the container contracts, that might be useful down the line for us.

Will try with the BSP matched sample Dockerfile and report back here.

@acostach I just tried GST_DEBUG=3 gst-launch-1.0 videotestsrc num-buffers=10 ! videoconvert ! omxh265enc ! queue ! fakesink dump=true on the https://github.com/balena-io-examples/tx2-container-contracts-sample/blob/master/tx2_32_4_2/Dockerfile.template - works.

I’ll rework our own container to use this and will report back later today.

Thanks for your help!

@rapha yes, from what I’ve seen the BSP contents aren’t provided by debs. If you noticed when flashing the default ubuntu without the nvidia sdk mananager, you first grab the BSP, unpack it, then unpack the rootfs inside the corresponding folder and then issue apply_binaries.sh. I believe the SDK follows the same steps.

@acostach I’ve now deployed our own stack based on your Dockerfile. It appears to work. I will consider this one solved for now and reopen if I experience the issue again.

Sounds good, thanks Raphael.

Still looking good this morning. Thank you for your help @acostach!

Glad to hear that, you are welcome!