Jetson Xavier NX Fails to Boot

I’ve been running about 30 of the Jetson Xavier NX devices with balena for about 9 months. The devices typically get powered off every night and power back up the next morning. Over that time, I’ve had at least 4 devices enter into a state where the image on the SD card would no longer boot and bring up a network connection. Some of the specifics for my setup are:

  • Device Type: Nvidia Jetson Xavier NX Devkit SD-CARD
  • Balena OS: v2.53.12+rev1
  • Edition: Production

On all of the failures, I’ve been able to confirm that the Jetson NX was getting good power - I measured via multimeter and could see power lights illuminating on the devkit board.

In all failure cases, there did not appear to be a hardware failure of the SD card - I was able to reflash the SD card with another balena image at get the same SD card/Jetson NX combo up and running again with a clean install on three of the four failures. On the fourth failure case, I’ve saved the SD card in the non-bootable state in case there are some suggestions from the online community here to help in understanding/diagnosing the problem.

When I try to boot a Jetson Nx device with a monitor connected, I get the following output:

The first 4 lines about board setup failure seem to appear on all of the Jetson NX devices I’ve observed. So, I believe the failed mounts and mismatched checksums are more closely related to the problem.

With this failure mode, it appears the host OS is never booting and I can confirm the network connection never gets brought up. My somewhat course understanding is that balena host OS should be on a read-only partition that prevents it from getting corrupted and failing to boot up? Any ideas of what might be causing this failure mode and how to prevent it from happening?

Hi Justin, interesting problem you have there. I have pinged a few folks to see if there is any information we can glean from this, and sorry to hear about your troubles.

Hey Justin, I’m looking at the partition layout for the Xavier NX and it looks like partition 10 is a required BSP partition with the description **Required.** Slot A; contains TegraBoot binary..
We only run fsck on the resin-* partitions so it’s possible that this BSP partition was corrupted by a bad unmount during poweroff or something similar.

Are these devices being shutdown correctly?