Failed to download image; no space left on device, jetson agx xavier

Hi,
I’m new to balena cloud. I just added a Jetson AGX Xavier device to my application, flashed the jetson using jetson-flash with this image (balena-cloud-XAVIER-jetson-xavier-2.67.3+rev5-dev-v12.3.0.img) and pushed the hello, world c++ sample app using balena-cli. The push command was successful and the device’s target release gets updated. However, when the device tries to download the image, I get the following error:
Failed to download image 'registry2.balena-cloud.com/v2/1c20ca3cbf0f43b574ed193512392af1@sha256:7362eefcd868a06e46ad8e046f79a4ce904af0309459ec103ccd5edd4e418802' due to 'failed to register layer: Error processing tar file(exit status 1): write /usr/lib/aarch64-linux-gnu/libcrypto.so.1.1: no space left on device'
So it looks like the docker pull is failing because the device has insufficient memory.
I believe the xavier should have enough memory to download the image and run it.
Can someone help me with this? Thanks in advance.

Hi there,
Can you confirm if this is where you’re getting the Hello World code and the instructions you’ve been following up to this point? Get started with Nvidia Jetson Xavier and C++ - Balena Documentation
Thanks!

Yes, I have been following the instructions from that link and I downloaded the hello world code from the link in that guide. However, the Dockerfile.template included in the zip file that I downloaded was using Resin which did not work, so I replaced resin with balenalib to get the build to work.

Thanks for confirming, and for telling me about the outdated reference to Resin - I’ll work on getting that updated.

Don’t know if this helps but I thought I’ll share anyway. I don’t know why the storage shows only 169MB. balena_dashboard

That file is really small, and wouldn’t have taken up enough space to cause the error you’re seeing. Neither should our base image since we built it for the Xavier. I’m wondering if there was a hiccup during the pull… is there any chance you could try pulling it again for me to see if you see the same? That error feels like something else is going on, but I would still like to give a second pull a shot if you have the ability…

Oh interesting…

Could you go to the Diagnostics page and run the Healthchecks to see if it’s showing anything wrong with storage?

I’m running the health check now. The xavier did try pulling multiple times, I tried rebooting, re-building the image and pushing it again but it failed everytime.

I get the following error when I run the healthchecks:
An error occurred while querying checks data: Bus n/a: changing state UNSET → OPENING Bus n/a: changing state OPENING → AUTHENTICATING Bus n/a: changing state AUTHENTICATING → RUNNING Sent message type=method_call sender=n/a destination=org.freedesktop.systemd1 path=/org/freedesktop/systemd1/unit/balena_2eservice interface=org.freedesktop.DBus.Properties member=Get cookie=1 reply_cookie=0 signature=ss error-name=n/a error-message=n/a Got message type=method_return sender=org.freedesktop.systemd1
Health check output json:
{"diagnose_version":"4.20.23","checks":[{"name":"check_balenaOS","success":true,"status":"Supported balenaOS 2.x detected"},{"name":"check_container_engine","success":true,"status":"No container_engine issues detected"},{"name":"check_localdisk","success":true,"status":"No localdisk issues detected"},{"name":"check_memory","success":true,"status":"93% memory available"},{"name":"check_networking","success":true,"status":"No networking issues detected"},{"name":"check_os_rollback","success":true,"status":"No OS rollbacks detected"},{"name":"check_service_restarts","success":true,"status":"No services are restarting unexpectedly"},{"name":"check_supervisor","success":false,"status":"Supervisor is running, but may be unhealthy"},{"name":"check_temperature","success":false,"status":"Some temperature issues detected: \ntest_current_temperature Temperature above 80C detected (/sys/class/thermal/thermal_zone4)"},{"name":"check_timesync","success":true,"status":"Time is synchronized"}]}

Don’t know if this helps df -h output when I ssh in to the xavier:

root@a6e8f9e:~# df -h
Filesystem                      Size  Used Avail Use% Mounted on
devtmpfs                         16G     0   16G   0% /dev
tmpfs                            16G  172K   16G   1% /tmp
/dev/disk/by-state/resin-rootA  461M  336M   98M  78% /mnt/sysroot/active
/dev/disk/by-state/resin-state   19M  391K   17M   3% /mnt/state
overlay                         461M  336M   98M  78% /
/dev/mmcblk0p41                 170M   79M   79M  51% /mnt/data
tmpfs                            16G     0   16G   0% /dev/shm
tmpfs                            16G   39M   16G   1% /run
tmpfs                            16G     0   16G   0% /sys/fs/cgroup
/dev/mmcblk0p37                 120M   54M   66M  45% /mnt/boot
tmpfs                            16G   28K   16G   1% /var/volatile
/dev/mmcblk0p39                 461M  2.3M  431M   1% /mnt/sysroot/inactive

Okay, darn. At first I thought maybe it was a partitioning error that happened during the imaging process, but now I’m wondering if that’s not the case. I’m sending this question to our broader team since I see both high temperature and Supervisor warnings. I also see you’re working with Ross on our Customer Success team, so I’ve shared this thread with him for visibility on his end. We’ll get back to you as soon as the broader team has had a chance to review this and can provide some suggestions.

Ok, thank you for looking into this. At least while touching the xavier it is pretty cool and when I was working with the jetpack image it used to get a lot warmer. So I think that diagnostic message may not be accurate. I have shared the thread with Ross as well.

That’s interesting… 80C would definitely not be cool to the touch. Something’s off for sure. Thanks for sharing the extra info Avi. We’ll be in touch with more info (and probably more questions) soon. :slight_smile:

Ok, thank you! I’ll try re-flashing the image in the mean time, since, I haven’t tried that yet.

I tried re-flashing, but unfortunately it behaves the same. Same error when it tries to pull the image and same errors when I run diagnostics. And the storage size is still ~170MB (/dev/mmcblk0p41 170M 79M 79M 51% /mnt/data)

Thanks for sharing that, it’s very good info to have as we continue troubleshooting and trying to determine root cause.

In addition, I had this nagging feeling I had seen a high-temperature listed on NVIDIA devices before but seeing from them somewhere that it was considered normal, and I finally found the reference to it. Interestingly, when thermal_zone4 is read using the driver value, there are some cases where it can be fixed to indicate a normal status. I don’t know more about this than what I’m seeing here, but at least wanted to share it with you while I was thinking about it and in case something similar might be applied to Xavier AGX. thermal_zone4 reports 100 degree celcius ? - #5 by sjlin - Jetson TX1 - NVIDIA Developer Forums

I don’t know what to make of that either since I don’t know what thermal zone your diagnostic queries to generate that report. However, I can confirm for you that thermal zone4 type on AGX Xavier is PMIC and it has a temp value of what I think is 100C.

root@3c882a0:~# cat /sys/class/thermal/thermal_zone0/type
CPU-therm
root@3c882a0:~# cat /sys/class/thermal/thermal_zone1/type
GPU-therm
root@3c882a0:~# cat /sys/class/thermal/thermal_zone2/type
AUX-therm
root@3c882a0:~# cat /sys/class/thermal/thermal_zone3/type
AO-therm
root@3c882a0:~# cat /sys/class/thermal/thermal_zone4/type
PMIC-Die
root@3c882a0:~# cat /sys/class/thermal/thermal_zone4/temp
100000
root@3c882a0:~# cat /sys/class/thermal/thermal_zone0/temp
28500

That’s great to know, thanks for sharing Avi.

In addition, we’ve got a PR setup to update the Getting Started page. Thanks again for letting us know it was outdated. Outdated base image referenced in Getting Started for Jetson Xavier · Issue #1595 · balena-io/docs · GitHub

1 Like

Hello Avi. As mentioned, I asked our devices team for their insights on the storage question and we’ll let you know as soon as we have a suggestion for you. BTW, thanks for sending such good diagnostic information.