Configuring balena os for the jetson-xavier-nx-devkit-emmc

Hi,
i build a fresh image for the jetson-xavier-nx-devkit-emmc and would like to deploy it within open-balena. Sadly OpenBalena currently does not support jetson-xavier-nx-devkit-emmc even though the official balena.io environment supports the jetson-xavier-nx-devkit-emmc image. Do you know how i can add the machine to the open-balena-api?

On a fresh deployment of open-balena
When running balena os configure balena-image-jetson-xavier-nx-devkit-emmc.balenaos-img --fleet TestNX --version 2.87.1+rev i get the error Partition not found: 37.

On balena.io
balena os configure balena-image-jetson-xavier-nx-devkit-emmc.balenaos-img --fleet TestNX --version 2.87.1+rev works fine :confused:

I noticed that the api endpoints https://api.my-open-balena.deployment/device-types/v1/ and https://api.balena-cloud.com/device-types/v1/ have a difference in supported devices.

specially the jetson-xavier-nx-devkit-emmc is missing:

{
      "slug":"jetson-xavier-nx-devkit-emmc",
      "version":1,
      "aliases":[
         "jetson-xavier-nx-devkit-emmc"
      ],
      "name":"Nvidia Jetson Xavier NX Devkit eMMC",
      "arch":"aarch64",
      "state":"RELEASED",
      "instructions":[
         "Put the NVidia Jetson Xavier NX board in recovery mode",
         "Unzip BalenaOS image and use <a href=\"https://github.com/balena-os/jetson-flash\">Jetson Flash</a> to provision the device.",
         "After flashing is completed, please wait until the board is rebooted"
      ],
      "gettingStartedLink":{
         "windows":"https://docs.balena.io/jetson-xavier-nx-devkit-emmc/nodejs/getting-started/#adding-your-first-device",
         "osx":"https://docs.balena.io/jetson-xavier-nx-devkit-emmc/nodejs/getting-started/#adding-your-first-device",
         "linux":"https://docs.balena.io/jetson-xavier-nx-devkit-emmc/nodejs/getting-started/#adding-your-first-device"
      },
      "supportsBlink":false,
      "yocto":{
         "machine":"jetson-xavier-nx-devkit-emmc",
         "image":"balena-image",
         "fstype":"balenaos-img",
         "version":"yocto-honister",
         "deployArtifact":"balena-image-jetson-xavier-nx-devkit-emmc.balenaos-img",
         "compressed":true
      },
      "options":[
         {
            "isGroup":true,
            "name":"network",
            "message":"Network",
            "options":[
               {
                  "message":"Network Connection",
                  "name":"network",
                  "type":"list",
                  "choices":[
                     "ethernet",
                     "wifi"
                  ]
               },
               {
                  "message":"Wifi SSID",
                  "name":"wifiSsid",
                  "type":"text",
                  "when":{
                     "network":"wifi"
                  }
               },
               {
                  "message":"Wifi Passphrase",
                  "name":"wifiKey",
                  "type":"password",
                  "when":{
                     "network":"wifi"
                  }
               }
            ]
         },
         {
            "isGroup":true,
            "isCollapsible":true,
            "collapsed":true,
            "name":"advanced",
            "message":"Advanced",
            "options":[
               {
                  "message":"Check for updates every X minutes",
                  "name":"appUpdatePollInterval",
                  "type":"number",
                  "min":10,
                  "default":10
               }
            ]
         }
      ],
      "configuration":{
         "config":{
            "partition":{
               "primary":9
            },
            "path":"/config.json"
         }
      },
      "initialization":{
         "options":[
            {
               "message":"Select a drive",
               "type":"drive",
               "name":"drive"
            }
         ],
         "operations":[
            {
               "command":"burn"
            }
         ]
      },
      "buildId":"2.88.4+rev10",
      "logoUrl":"https://files.balena-cloud.com/images/jetson-xavier-nx-devkit-emmc/2.88.4%2Brev10/logo.svg"
   },

Since the jetson-xavier lists partition 37 as the primary one, the behaviour is related to this. A temporary fix is to hardcode partition 9 in the balena-cli implementation :see_no_evil:

#commands/os/configure.js
# line 75

const image = params.image;
        console.info(image);
        deviceTypeManifest.configuration.config.partition.primary = 9
        console.info(deviceTypeManifest.configuration.config.partition.primary);

Still it would be great if some one could point me in the direction to add support to the open-balen-api, since i could not find the relevant files so far.

Nice detective work here @Langhalsdino … and you are correct, there is not 1-to-1 device parity between OpenBalena and balenaCloud.

@thgreasi, @dfunckt, or @ab77 - do you know what file(s) Frederic needs to be patch in the Open Balena API to properly add support for his Xavier? Thanks!

After flashing the jetson successfully using the cli hack, the balena_superwiser is unable to connect with the api successfully, since the device is unknown to the balena api.

 Event: Device bootstrap failed, retrying {"delay":30000,"error":{"message":"Unknown device type jetson-xavier-nx-devkit-emmc","stack":"ApiError: Unknown device type jetson-xavier-nx-devkit-emmc\n    at Object.<anonymous> (/usr/src/app/dist/app.js:10:466019)\n    at Generator.next (<anonymous>)\n    at fulfilled (/usr/src/app/dist/app.js:10:1951700)\n    at processTicksAndRejections (internal/process/task_queues.js:97:5)"}}

patching the /resin-boot/config.json with the modified deviceType did not work. My guess is that building the emmc image with a jetson-xavier alias is not going to work either. So i probably need to patch / modify the open-balena api server.

There is an additional config file /mnt/boot/device-type.json that needs to be modified, such that the slug is the desired "jetson-xavier".

Anyhow, i am still interested in patching the api to support the emmc image.

@Langhalsdino Can you grab some info for us please?

Specifically, we’re wondering about:

  • The API / CLI / SDK versions you are using. And, bump them to latest versions, if they are not.
  • What does https://api.my-open-balena.deployment/v6/device_type/ return ?

OpenBalena should actually support everything located in contracts/contracts/hw.device-type at master · balena-io/contracts · GitHub (which I see contains jetson-xavier-nx-devkit-emmc, so we should be able to get you sorted. :slight_smile:

So we have been using the api version tag: “v0.139.0” and i tried it briefly with tag: “v0.191.0” (but i found a few miss configurations in the 0.191.0 deployment, so this is probably my fault).

I will now upgrade to the “v0.192.0” and upgrade the rest of our infrastructure too.

Furthermore, I have been using die balena cli 13.1.7 and other newer versions.

I keep you updated :slight_smile:

Upgrading to v0.192.0 helped and the devices are now supported as listed in the contracts.

Great, that is excellent news @Langhalsdino … keep us posted on other issues you come across, and I’ll check your other threads to make sure they have answers or were resolved as well. With bee season coming up, we’ve got to get you ready!

I’m also curious to know about performance gains you receive with the Xavier versus Nano, and any iterations on the hives / lessons learned. Send pics! :partying_face:

1 Like

Just to keep it short:
The Xavier NX DLA cores are really interesting, since we can use them and keep the NX at ~10W with way more usable compute power (bee FPS), then the Jetson Nano at 10W. Having CUDA and 2x DLA enables us to run three ore more CNNs simultaneously at 10W :exploding_head: .

Due to the chip shortages, … we got into building our own mainboard for the modules and it was quite a fun ride to get PCIE lanes, … working.

I send you an email with pictures in the next week.

By now every question is answered :slight_smile:

1 Like

Just to let you know,
https://api.balena-cloud.com/device-types/v1/ still returns something different, than https://api.my-open-balena.deployment/device-types/v1/

This is not a problem, since i kind of hacked a way around it as shown earlier :slight_smile:

After upgrading to v0.192.0 i get the following error on the balena_supervisor of my devices.

[error]   LogBackend: server responded with status code: 504

could you point me to the parts in the openbalena api, that relate to this behaviour? Or how i can get more verbose debugging than balena logs balena_supervisor

That 504 error is from the Supervisor trying to stream journal logs to your openBalena API. The Supervisor code is: balena-supervisor/balena-backend.ts at master · balena-os/balena-supervisor · GitHub and the exact API endpoint is: open-balena-api/index.ts at 3e5acf913374deb27d7d84e83b4a5223c2cfa505 · balena-io/open-balena-api · GitHub.

1 Like

The 504 indicates a a timeout so it seems the plumbing to the API is broken but the API might be ok. I would investigate if you can reach that endpoint via curl like so:

mig@ghost ~$ curl -I https://api.balena-cloud.com/device/v2/123/logs 
HTTP/2 401 
date: Mon, 28 Feb 2022 22:35:35 GMT 
x-frame-options: DENY 
x-content-type-options: nosniff 

This tells us that I can reach the endpoint and it’s complaining to me about authorization which is expected since I did not pass any credentials (I also made up a device uuid of 123). If you can curl that endpoint but substitute your open balena API from your dev machine great! Now try to curl it from the device running balenaOS.

Let us know if you’re successful with this curl. I expect you’ll uncover there is a networking issue from another client (not just your balenaOS device) and then we can go from there to troubleshoot. Since you’re getting a 504 that’s not a response from the openBalena API unfortunately so our work is not cut out for us…we have to troubleshoot what is sitting in between.

2 Likes

Hi @20k-ultra thank you for your support.

I tried to call the url from my computer i get a 401 and on my jetson i get the 401, too. Therefore it appears, that i do not have a networking problem. I will further try to investigate it.

Edit:
Even within the supervisor container it is possible :confused:

Furthermore using a valid bearer token yields valid output of the logs :smiley:
curl -H 'Accept: application/json' -H "Authorization: Bearer ${TOKEN}" https://api.custom-balena.domain/device/v2/UUID/logs

I think my issue is about the openvpn connection and the LogBackend error, just might be a side effect.

When looking at the journalctlt i found out, that the OpenVPN connection can not be started and is in a infinit restart loop.

Mar 02 16:06:33 087a371 openvpn[3779]: Wed Mar  2 16:06:33 2022 NOTE: the current --script-security setting may allow this configuration to call user-defined scripts
Mar 02 16:06:33 087a371 openvpn[3779]: Wed Mar  2 16:06:33 2022 TCP/UDP: Preserving recently used remote address: [AF_INET]IP:443
Mar 02 16:06:33 087a371 openvpn[3779]: Wed Mar  2 16:06:33 2022 Socket Buffers: R=[87380->87380] S=[16384->16384]
Mar 02 16:06:33 087a371 openvpn[3779]: Wed Mar  2 16:06:33 2022 Attempting to establish TCP connection with [AF_INET]IP:443 [nonblock]
Mar 02 16:06:34 087a371 openvpn[3779]: Wed Mar  2 16:06:34 2022 TCP connection established with [AF_INET]IP:443
Mar 02 16:06:34 087a371 openvpn[3779]: Wed Mar  2 16:06:34 2022 TCP_CLIENT link local: (not bound)
Mar 02 16:06:34 087a371 openvpn[3779]: Wed Mar  2 16:06:34 2022 TCP_CLIENT link remote: [AF_INET]IP:443
Mar 02 16:06:34 087a371 openvpn[3779]: Wed Mar  2 16:06:34 2022 Connection reset, restarting [0]
Mar 02 16:06:34 087a371 openvpn[3779]: Wed Mar  2 16:06:34 2022 SIGUSR1[soft,connection-reset] received, process restarting
Mar 02 16:06:34 087a371 openvpn[3779]: Wed Mar  2 16:06:34 2022 Restart pause, 80 second(s)
~``

Is the device still returning the LogBackend 504 error even though your curl from the same network returns 401/402 ?

The 402 is a very odd response to get since that’s not a standard code to use. If you get it again i’d love to see the header responses to see if it mentions anything else.

sorry 401 for both. :see_no_evil:

how about this ?

The LogBackend issue is also discussed here: LogBackend: server responded with status code: 504 - Resolution? - #51

1 Like