Jetson AGX suddenly not booting anymore

Hello - about two months ago I setup a new Jetson AGX for a computer vision application. I flashed it with this tool using the dev-jetson-xavier-2.67.3+rev5-dev-v12.3.0 image. Everything went well until the AGX recently went offline. When I plugged in a monitor, I discovered the device is hanging with the following log:
...
Alternate GPT is invalid, using primary GPT.
mmcblk0: p1 p2 p3 p4 p5 p6 p7 p8 ... p41
blk_update_request: I/O error, dev mmcblk0rpmb, sector 0
blk_update_request: I/O error, dev mmcblk0rpmb, sector 0
I’m guessing this has something todo with the emmc, but I’m not familiar with this. I have not attempted to reflash the device, as I would definitely like to know what is causing this first and avoid the risk that this problem will reappear after some up-time.
Any help/advice would be much appreciated.

Hi there, you are correct, that does definitely indicate some corruption on the storage, or failing blocks. I am not totally familiar with the AGX, but is that a special partition, mmcblk0rpmb ? It doesnt follow the normal mmcblk0px convention I would have assumed as normal.

If the device was online and functioning fine, then all of a sudden crashed and went offline (especially if it had been running fine for months), that’s definitely a bad sign. Does the AGX have the ability to mount the eMMC storage on a host PC and access it over USB like some other boards do? If so, you could possibly grab any critical data from it.

By default, balenaOS logs to RAM, so with the device now rebooted / powered down the logs will have been cleared, so we won’t have much to go off unfortunately. I realize it’s not too helpful, but I think attempting to re-flash it will be your best bet at recovering the device.

Thank you for the reply - I will attempt to reflash and report back here.

Hi - I reflashed the device now and it comes back online in the dashboard. However, when I connect a display, during startup I still see the following messages passing by (which I recovered by running dmesg in the Host OS terminal):
[ 20.088651] blk_update_request: I/O error, dev mmcblk0rpmb, sector 0
[ 20.192203] blk_update_request: I/O error, dev mmcblk0rpmb, sector 0
I attached the full dmesg output.
I’m worried since these were the same messages I saw when the system stopped booting … . I will keep monitoring the device for a while before shipping it back to its field location, but any thoughts from your side are very welcome. Please also let me know if there’re any further diagnostic steps I can take.
dmesg-output.txt (68.4 KB)

Hello @mcmchammer

A quick internet search suggests that these messages may be safely ignored. They are related to the kernel trying to scan for a partition table in the eMMC Read Protected Memory Block (RPMB).

Ok - thanks a lot for following up