Balena OS Jetson Nano Tensorflow

From my naive understanding this sounds like an issue with the container since it seems to be restarting all the time. Maybe some one from the Balena Team has a bit more experience and can help to debug this :slight_smile:

It might be helpful to define an ENTRYPOINT ["/bin/bash"] so CMD [“sleep”, “infinity”] gets executed in the bash. But i am not sure if this is the issue.

Furthermore is there a way you can access the devices logs? In case you got the Balena CLI installed on your computer you can get them with balena logs {UUID} --system. They might give you a better clue on what might be the issue.

Hi

I have used this image as a base image to push my project into application in Balena.

Then when I installed Tensorflow-gpu==1.13.1 on that, it has been well done.

Btw, while I loaded the TensorRT model, I had an error as following:

    with tf.gfile.FastGFile(graph_file, "rb") as f:
        graph_def = tf.GraphDef()
        graph_def.ParseFromString(f.read())
raise _DecodeError('Unexpected end-group tag.')
google.protobuf.message.DecodeError: Unexpected end-group tag.

But when I running this project loading TRT model, it has been well done.

So I guess TensorRT version 5.1.6 of this base image on Jetson Nano may not support loading TRT model which I have converted in my local PC.

Am I right?

And reading the Nvidia Forums, JetPack supports TensorRT 6.0.1.
So is this issue because of the difference of TensorRT version?

I hope your response

Thanks
Artem

To my knowledge TensorRT 6.0.1 is not supported on the Jetson Nano and therefore new features of 6.0.x are not supported.

Just as an update TensorRT 6.x is only available in the Developer preview of JetPack 4.3 for the Jetson AGX Xavier Dev kits.

Hi

Then how can we solve this issue?

I have converted keras model(.h5) into TensorRT model by following https://www.dlology.com/blog/how-to-run-keras-model-on-jetson-nano/.
When I used the converted TensorRT model on my local pc, it has been well done.
At that time, Tensorflow version is 1.13.1

Then after pushing this project into the application on Balena cloud, when I run this project in terminal, it said that the TensorRT models couldn’t be loaded.
At this time, Tensorflow-gpu version is 1.13.1, the basic image is the same as above.

Why this happens?

Best regards
Artem

How did you install tensorflow? Did you use:

RUN apt-get install python3-pip libhdf5-serial-dev hdf5-tools
RUN pip3 install --extra-index-url https://developer.download.nvidia.com/compute/redist/jp/v42 tensorflow-gpu==1.13.1+nv19.3 --user

Hi

This means when I installed Tensorflow-gpu 1.13.1 on Balena Cloud, it was successful.
That is to say, when I run the follwoing command on the application of Balena Cloud, it was well done.

$ python3
>>import tensorflow
>>tf.__version__
>>1.13.1

Then when running project, it failed on loading TensorRT model.

Yes, I installed Tensorflow like this.

Is there something wrong?

Thanks
Artem

No its correct! I just wanted to double check, if this might be the issue.

Sadly i have no clue about the reasons for this behaviour. Since the jetson Nano runs a full version of tensorflow, CUDA, … . It might be worth to try to convert the model on the device itself to a TensorRT model, so that version miss matches between your computer and the Jetson are not the issue.

Lastly you can try to adapt the docker-compose file to run in priveleged=true mode. Since there might be further issues when communicating with the hardware

services:
  yourAppName:
    restart: always
    build: .
    privileged: true
    devices:
      # JTOP
      - "/etc/nvpmodel.conf:etc/nvpmodel.conf"
    cap_add:
      - SYS_RAWIO
    labels:
      io.balena.features.kernel-modules: '1'

As a disclaimer: I never used TRT with TF 1.x on the jetson nano and balena, therefore i am currently only guessing about the possible reasons.

It might be worth to publish the error in the NVIDIA developer Forum, since they might have further ideas what to do and know way more about TF and TRT.

Hi

I am always thankful for your help.
Thanks to your help, I was able to solve the lots of issues for this project.

Btw, I have encountered another challenge concerned with OpenCV.
As you know, I have been using the following image as a basic one.
FROM bouwe/jetson-nano-l4t-cuda-cudnn-nvinfer-tensorrt-opencv:latest

When I check the version of OpenCV, it is 4.1.2.
Sounds good.
But when I capture by the web camera, it can’t open the camera.

cap = cv2.VideoCapture(0)
ret, img = cap.read()

[ WARN:0] global /app/opencv-4.1.2/modules/videoio/src/cap_v4l.cpp (802) open VIDEOIO ERROR: V4L: can’t open camera by index 0

I guess it is based on the OpenCV built on this image.
How do you think about that?
Have you ever used the OpenCV on this image?

Best regards
Artem

Do you want to use a ribbon cable camera (e.g. raspberry Pi camera) or usb camera?

With using the ribbon camera, you will probably need to start the nvargus-daemon & in the background, such that is able to serve the video via gstreamer.

Furthermore, please check if you are able to see the video0 device in /dev/video0 (ribbon cable) or another usb webcam, …

Lastly check if your python code has root previledges.