Upgrading to balenaOS 2.47 slows down app by 75%

pdboef · April 14, 2020, 5:47pm

Hi,

I’ve created a small benchmark application that uses TensorFlow Lite to run 100 inferences on a toy neural network (model.tflite). The benchmark demonstrates a significant slowdown (75%) on Jetson Nano when upgrading balenaOS from 2.45 to 2.47. I have no idea what is causing it. It seems that calling the invoke function does something in the OS that got much worse when moving from 2.45 to 2.47 (production images). I’m not very familiar with the OS itself, so it would be great if anybody has an idea what’s causing it, and more importantly, if and how it could be fixed. To reproduce my results (see below), use the following commands (assuming you have an appropriate balena application <app-name> set up):

git clone https://github.com/pascal-hwky/balenaos-slowdown.git
cd balenaos-slowdown
git checkout forumpost
balena push <app-name>

After doing 100 inferences, the benchmark script will print out the minimum, maximum and average time it takes per inference. My results are pasted here:

2.45

balenaOS 2.45.1+rev3 (production), supervisor 10.3.7

Finished 100 inferences.
Max time: 156 [ms]
Min time: 78 [ms]
Avg time: 81 [ms]

2.47

balenaOS 2.47.1+rev3 (production), supervisor 10.3.7

Finished 100 inferences.
Max time: 593 [ms]
Min time: 326 [ms]
Avg time: 376 [ms]

acostach · April 15, 2020, 7:36am

Hi @pdboef ,

I can’t find 2.47.1+rev1 in the dashboard for the nano, only 2.47.1+rev3. A few questions to clarify:

Do you use a custom build?
Did you flash 2.47.1+rev3 to the sd-card, or did a hostOS update to 2.47.1+rev3?
Is the sample app running on the nano or on an M2 coral tpu connected to the nano?

Tags 2.45.1+rev3 and 2.47.1+rev1 are both on L4T 32.2, so there shouldn’t be any difference, however 2.47.1+rev3 is a production image which upgraded to L4T 32.3.1.

I wonder if it’s L4T 32.3.1 that’s causing the slowdown with that particular tflite version.

pdboef · April 15, 2020, 7:43am

Hello @acostach,

Sorry for the typo, I indeed meant 2.47.1+rev3. Regarding your questions:

No, we are not using a custom build. It’s the Jetson Nano Dev Kit.
For both runs, I flashed the OS to (the same) SD card.
The sample app is running on the CPU of the Nano itself. No peripherals were attached.

A change in L4T version could definitely impact it! My initial thought was that the power settings have changed; the power mode of the device can be adjusted and perhaps 2.47.1 is using a very conservative mode by default.

acostach · April 15, 2020, 7:49am

Thanks for clarifying the setup, it sounds fine, I see no problem with that.

So yes, it’s the upgrade to L4T 32.3.1 that needs adjusting to.

pdboef · April 15, 2020, 5:15pm

@acostach I’ve just obtained good inference results from a standard JetPack image running L4T 32.3.1:

git clone https://github.com/pascal-hwky/balenaos-slowdown.git
cd balenaos-slowdown
docker build -t slowdown .
docker run -it --rm slowdown

gives the expected benchmark results:

Finished 100 inferences.
Max time: 165 [ms]
Min time: 78 [ms]
Avg time: 81 [ms]

To confirm that I’m indeed on L4T 32.3.1, I’ve used:

cat /etc/nv_tegra_release

Output:

# R32 (release), REVISION: 3.1, GCID: 18186506, BOARD: t210ref, EABI: aarch64, DATE: Tue Dec 10 06:58:34 UTC 2019

Perhaps it’s not the upgrade after all, but something else that changed from 2.45 tot 2.47?

acostach · April 16, 2020, 8:28am

@pdboef,

I’ve tried your example on my sd-card Jetson Nano, checked with a barrel jack adapter 5V 5A as well as with a 2A charger on the micro-usb port. I ran tests directly with 2.47.1+rev3:

root@b7af697:~# cat /etc/os-release 
ID="balena-os"
NAME="balenaOS"
VERSION="2.47.1+rev3"
VERSION_ID="2.47.1+rev3"
PRETTY_NAME="balenaOS 2.47.1+rev3"
MACHINE="jetson-nano"
VARIANT="Development"
VARIANT_ID="dev"
META_BALENA_VERSION="2.47.1"
RESIN_BOARD_REV="4d22638"
META_RESIN_REV="c6a9c94"
SLUG="jetson-nano"

root@b7af697:~# uname -r
4.9.140-l4t-r32.3.1

I’m always seeing values very similar to what you pasted for 2.45:

16.04.20 10:30:22 (+0300)  main  Finished 100 inferences.
16.04.20 10:30:22 (+0300)  main  Max time: 148 [ms]
16.04.20 10:30:22 (+0300)  main  Min time: 78 [ms]
16.04.20 10:30:22 (+0300)  main  Avg time: 104 [ms]
16.04.20 10:30:29 (+0300) Service exited 'main sha256:51ce4a467faae90bfb81a436f350a81999a7c937a1caa8ef0b5e4531a52a93f1'

…

16.04.20 10:30:41 (+0300)  main  Finished 100 inferences.
16.04.20 10:30:41 (+0300)  main  Max time: 162 [ms]
16.04.20 10:30:41 (+0300)  main  Min time: 77 [ms]
16.04.20 10:30:41 (+0300)  main  Avg time: 104 [ms]
16.04.20 10:30:48 (+0300) Service exited 'main sha256:51ce4a467faae90bfb81a436f350a81999a7c937a1caa8ef0b5e4531a52a93f1'

Average dropped if I set static CPU frequency:

16.04.20 11:11:01 (+0300)  main  Finished 100 inferences.
16.04.20 11:11:01 (+0300)  main  Max time: 149 [ms]
16.04.20 11:11:01 (+0300)  main  Min time: 73 [ms]
16.04.20 11:11:01 (+0300)  main  Avg time: 76 [ms]

for each core:

root@b7af697:~# echo 1479000 > /sys/devices/system/cpu/cpu<0/1/2/3>/cpufreq/scaling_max_freq
root@b7af697:~# echo 1479000 > /sys/devices/system/cpu/cpu<0/1/2/3>/cpufreq/scaling_min_freq

as described at Welcome — Jetson Linux<br/>Developer Guide 34.1 documentation

Even with nvpmodel -m 1 (power model 5W) the performance dropped by only around a half:

16.04.20 11:16:49 (+0300)  main  Finished 100 inferences.
16.04.20 11:16:49 (+0300)  main  Max time: 344 [ms]
16.04.20 11:16:49 (+0300)  main  Min time: 153 [ms]
16.04.20 11:16:49 (+0300)  main  Avg time: 177 [ms]

Do you spot something in top that’s eating up CPU constantly on the 2.47?

acostach · April 16, 2020, 8:41am

I don’t see this with the 2.47 Production image either, but I wasn’t expecting any difference.

So wonder if there’s either something eating up CPU or it could be due to thermal throttling? Does the board feel sluggish?

pdboef · April 20, 2020, 1:22pm

Hi @acostach, I’m surprised to see that you do get good results, even though we are both using a Jetson Nano Dev kit and the same software. I’ve received another Jetson Nano Dev kit today, and I still get the lower inference speeds on balenaOS 2.47. This time, I have powered the board from USB (5V, 2A). I’ve also taken a snapshot of top while it was running the benchmark in 2.47:

2300  2223 root     R     370m   9%  57% python3 benchmark.py
 1190     1 root     S    13120   0%   1% @sbin/plymouthd --tty=tty1 --mode=boot --pid-file=/run/plymouth/pid --attach-to-session --kernel-command-line=plymouth.ignore-serial-consoles splash
 1619  1344 root     S     767m  19%   1% balena-engine-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/6a012a0a611407b8122f3aca17e12578b5d8310
 2055  1344 root     S     694m  17%   1% balena-engine-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/e185fddf084f16e72d61901b72fddc8bebf237e
 1304     1 root     S    1397m  35%   1% /usr/bin/balenad --experimental --log-driver=journald -s overlay2 -H fd:// -H unix:///var/run/balena.sock -H unix:///var/run/balena-engine.sock --dns 10.114.102.1 -
 1344  1304 root     S    1132m  28%   0% balena-engine-containerd --config /var/run/balena-engine/containerd/containerd.toml --log-level info
 2289  2277 root     R     3340   0%   0% top
 1221     2 root     SW       0   0%   0% [nvgpu_channel_p]
 1297     1 openvpn  S     7744   0%   0% /usr/sbin/openvpn --writepid /run/openvpn/openvpn.pid --cd /etc/openvpn/ --config /etc/openvpn/openvpn.conf --connect-retry 5 120
 1281     1 avahi    S     4840   0%   0% avahi-daemon: running [508289a.local]
  208     2 root     SW       0   0%   0% [mmcqd/0]
   18     2 root     SW       0   0%   0% [kworker/1:0]
 2076  2055 root     S     875m  22%   0% node /usr/src/app/dist/app.js
 2045  1963 root     S     695m  17%   0% balena start --attach resin_supervisor
 2193  2169 root     S     623m  16%   0% balena exec -it 6a012a0a611407b8122f3aca17e12578b5d8310cfd9b94cf7b83bf22b3e0ed62 /bin/sh -c echo "$$" > /tmp/balena-enter-2169-OSD0gyMZ.pid; test -x /bin/bash && ex
 1290     1 root     S     385m  10%   0% /usr/sbin/NetworkManager --no-daemon
 1218     1 root     S     306m   8%   0% /usr/sbin/ModemManager --log-journal
    1     0 root     S     153m   4%   0% {systemd} /sbin/init
 1242     1 root     S    77500   2%   0% /usr/sbin/chronyd -s -d
  993     1 root     S    34056   1%   0% /lib/systemd/systemd-journald
 1196     1 root     S     7316   0%   0% /lib/systemd/systemd-logind
 2269     1 root     S     6096   0%   0% sshd: root@pts/2
 2137     1 root     S     6096   0%   0% sshd: root@pts/1
 2223  1619 root     S     5532   0%   0% /bin/bash

And another snapshot as it was running in 2.45:

 2255  2173 root     S     370m   9%  57% python3 benchmark.py
 1158     1 root     S    13128   0%   1% @sbin/plymouthd --tty=tty1 --mode=boot --pid-file=/run/plymouth/pid --attach-to-session --kernel-command-line=plymouth.ignore-serial-consoles splash
 1269     1 root     S    1277m  32%   0% /usr/bin/balenad --experimental --log-driver=journald -s overlay2 -H fd:// -H unix:///var/run/balena.sock -H unix:///var/run/balena-engine.sock --dns 10.114.102.1 -
  351     2 root     SW       0   0%   0% [mmcqd/0]
 1307  1269 root     S    1132m  28%   0% balena-engine-containerd --config /var/run/balena-engine/containerd/containerd.toml --log-level info
 2260  2249 root     R     3340   0%   0% top
 2084  2064 root     S     865m  22%   0% node /usr/src/app/dist/app.js
 2064  1307 root     S     767m  19%   0% balena-engine-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/c6aa2c7eab37c868eeaa9a9a22ee985b678c4f7
 2118  2097 root     S     695m  17%   0% balena exec -it cb7df8c5a2e7daf8ad4663bbf0aed4a8298af33a59d1021aa296bd560e11043e /bin/sh -c echo "$$" > /tmp/balena-enter-2097-c2I5qZYS.pid; test -x /bin/bash && ex
 1554  1307 root     S     694m  17%   0% balena-engine-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/cb7df8c5a2e7daf8ad4663bbf0aed4a8298af33
 2055  1959 root     S     623m  16%   0% balena start --attach resin_supervisor
 1256     1 root     S     385m  10%   0% /usr/sbin/NetworkManager --no-daemon
 1161     1 root     S     306m   8%   0% /usr/sbin/ModemManager --log-journal
    1     0 root     S    91912   2%   0% {systemd} /sbin/init
 1236     1 root     S    77500   2%   0% /usr/sbin/chronyd -s -d
  992     1 root     S    38192   1%   0% /lib/systemd/systemd-journald
 1260     1 openvpn  S     7736   0%   0% /usr/sbin/openvpn --writepid /run/openvpn/openvpn.pid --cd /etc/openvpn/ --config /etc/openvpn/openvpn.conf --connect-retry 5 120
 1168     1 root     S     7316   0%   0% /lib/systemd/systemd-logind
 1993     1 root     S     6228   0%   0% sshd: root@pts/1
 2211     1 root     S     6096   0%   0% sshd: root@pts/2
 2173  1554 root     S     5636   0%   0% /bin/bash
 1095     1 root     S     5360   0%   0% /lib/systemd/systemd-udevd
 1248     1 avahi    S     4840   0%   0% avahi-daemon: running [4bc4b31.local]
 1250  1248 avahi    S     4708   0%   0% avahi-daemon: chroot helpe

acostach · April 21, 2020, 8:53am

Yes, I’ve tried again some more times but I don’t see this at all. By the way, I think the barrel jack with a 5V 4A adapter is usually recommended for the nano.

For comparison, I’m using a https://www.amazon.com/Sandisk-MicroSDHC-V30-32GB-Extreme/dp/B01LRW8FWY, devkit is revision A02 and the Nano is also A02. Heat sink is also mounted.
Except for the sd-card, your setup is the same, right?

Also, did you see any improved results when switching the power mode? And when starting all cores and setting max static cpu frequency?

acostach · May 18, 2020, 10:59am

@pdboef so, do you have a B01 devkit? I received one today and could reproduce the slow down in your APP but only with the B01, the A02 revision does not exhibit this.

It looks like it’s related to the dtb which is not forward compatible from a02 to b01. I’ve created [jetson-nano]: B01 revision runs slower with A02 dtb · Issue #66 · balena-os/balena-jetson · GitHub to track this.

Until a new image is available, a short term solution is to take another sd-card and flash the 32.3.1 L4T Driver Package (BSP) using

sudo ./flash.sh jetson-nano-qspi-sd mmcblk0p1

You will see the partitions names and what’s written to them:

[  45.1447 ] Writing partition RP1 with tegra210-p3448-0000-p3449-0000-b00.dtb.encrypt
...
[  45.4537 ] Writing partition DTB with tegra210-p3448-0000-p3449-0000-b00.dtb.encrypt

Then take out this sd-card and place the balena image back in, boot the board and use a usb card reader for the flashed sd-card. In Balena hostOS, write the dtb from the usb to the sd-card that’s running balena:

dd if=/dev/sda3 of=/dev/mmcblk0p2 and mmcblk0p9

Please double-check with ls -l /dev/disk/by-partlabel that 0p2 and 0p9 are the RP1 and DTB partitions. After writing them you can sync && reboot the board and it should run fine again.

pdboef · May 18, 2020, 11:21am

Hi @acostach, I’m happy to hear that you were able to reproduce my issue, thanks for following up! I have confirmed that we indeed have the B01 revision, I will update my application to reflect the insight that the phenomenon is related to the revision number. We will hold off updating balenaOS until the issue has been resolved.

jtonello · May 18, 2020, 5:44pm

Thank you for the update! Please let us know how you make out.

John

remsol · May 28, 2020, 7:56am

Should be solved in current master by following commit

acostach · June 16, 2020, 11:31am

HI @pdboef, can you please check see if it works with new revision 2.51.1+rev1?

pdboef · June 22, 2020, 8:59am

Hi @acostach, the new revision is significantly faster than 2.47! However, the average inference time is still somewhat slower:

2.45

balenaOS 2.45.1+rev3 (production), supervisor 10.3.7

Finished 100 inferences.
Max time: 149 [ms]
Min time: 75 [ms]
Avg time: 78 [ms]

2.51

balenaOS 2.51.1+rev1 (production), supervisor 11.4.10

Finished 100 inferences.
Max time: 146 [ms]
Min time: 75 [ms]
Avg time: 103 [ms]

georgiats · June 22, 2020, 9:19am

Sounds great that the balenaOS 2.51.1+rev1 improves the speed again. Thanks for sharing this with us.
Does the average inference time cause any significant interference in your app?
I’ll share your results with the team, and see if there is anything we can look into.
Thanks,
Georgia

pdboef · August 8, 2020, 2:32pm

Hi @georgiats, observations of multiple devices in the field has indicated that 2.51.1+rev1 fortunately is fast enough again. Thanks for the support!

Topic		Replies	Views
Balena OS Jetson Nano Tensorflow Product support	29	3933	March 27, 2020
Nvidia Jetson Nano OS balenaOS	31	6617	August 7, 2021
Flash custom BalenaOS for Jetson Nano with jetson-flash balenaOS	31	3879	November 18, 2019
Jetson TX2 4GB Support balenaOS jetson	30	1090	June 24, 2021
Same code running 2x slower in Balena for GPU Jetson Xavier balenaOS docker , jetson	8	1345	October 19, 2020

Upgrading to balenaOS 2.47 slows down app by 75%

2.45

2.47

Related topics