Yes, the tests are running in the DevKit. Running jetson_clocks made no difference, same results. Moreover, just noticed that the Jetson with Balena has 32 GB and the other 16 GB of RAM, but I don’t think that could affect negatively the performance.
These are the results of tegrastast:
For balena Xavier
RAM 27794/31924MB (lfb 608x4MB) CPU [0%@1190,0%@1190,0%@1190,0%@1190,11%@1190,37%@1190,11%@1190,41%@1190] EMC_FREQ 40%@1331 GR3D_FREQ 99%@675 APE 150 MTS fg 0% bg 7% AO@29C GPU@31C iwlwifi@33C Tdiode@33.75C PMIC@100C AUX@28.5C CPU@30C thermal@29.55C Tboard@30C GPU 4797/513 CPU 774/725 SOC 2167/1137 CV 0/0 VDDRQ 1702/467 SYS5V 2688/1869
For raw ubuntu
RAM 13934/15823MB (lfb 124x4MB) SWAP 26/7911MB (cached 1MB) CPU [9%@2265,7%@2265,13%@2265,16%@2265,7%@2265,8%@2265,19%@2265,18%@2265] EMC_FREQ 0% GR3D_FREQ 98% AO@48C GPU@54.5C iwlwifi@47C Tdiode@52C PMIC@100C AUX@47C CPU@50.5C thermal@50C Tboard@47C GPU 20037/2670 CPU 2602/1987 SOC 4743/2236 CV 0/0 VDDRQ 2753/559 SYS5V 3588/2305
I don’t know if there is any substantial difference there. Also that L4T can be slightly different (just the last number) doesn’t seem like a reason for such a difference in performance.
I wanted to test a tensorflow model to see if same problem arises. Indeed running a model, same code, same behavior as the original issue. In Balena we get 20 FPS and without Balena 41FPS. I updated the repository to test this since it is easier to reproduce, no compilation required or tensorRT, just raw tensorflow.
EDIT: Just tested the code in balena with the 16GB version, and as expected, same results.