jetson nano stuck in boot

hi,
I am currently developing with the balenaOS on my Jetson nanos. the system has a tendency to get stuck at boot. this issue is resolved by reflashing it until it gets stuck again.

I have 3 production modules with emmc memory
module 1 has the production OS and runs fine 24/7 for the last 2 months
module 2 has been flashed with multiple versions and always gets stuck in boot after a day or 2
module 3 has been working fine for 2 weeks until it also got stuck today

could this be a software problem in the OS. the modules have been working fine for almost a year on the regular ubuntu

development OS:
(upload://A0UqjCzORTNDykmcKuJhB3Sxg8I.jpeg) :

!

production OS:

Hello @jap937

Please could you provide details of the versions of both the production and development images that you are using. It would be good if you could also provide the journal log output from journalctl for a full boot for both OS versions.

the version is balenaOS 2.67.3+rev2 with supervisor version 12.3.0 (i also tried 12.4.3)

journalctl_developmentOS.log (124.4 KB)
journalctl_productionOS.log (124.9 KB)

One other quick question on this one @jap937 – Are these modules in the normal Jetson Nano Developer Kit carrier board? If so, which revision is it?

Or, since these are eMMC units, are they in another carrier board, either 3rd party, or perhaps a custom one you have built?

I’m also really curious about Module 2…since it fails to boot nearly daily, could you revert that one back to Ubuntu and see if it still exhibits this same non-booting behavior?

they are production modules and are placed in a 3rd party carrier board.
i have previously been running some modules 24/7 for multiple weeks on ubuntu (with prototype application), but i will flash module 2 just to be sure.

update:
i found out what triggers this issue.
until a week ago, updating my images often lead to large updates (discussion) resulting in ‘no space left on device’ following in a failed update.

i temporarily resolved this by purging the system data, which appears to work fine. apparently, rebooting (or waiting a few hours) after these events causes the issue.