We’ve had this issue where our containers just aren’t starting on a BalenaOS deployed onto a Jetson Xavier NX emmc 16GB. We have a 8GB (ram) version running fine with also a 16GB emmc onboard. I have the following:
updated to new version of Balena Jetson
updated to new jetson-flash
tried different boards
We had to build a custom image because our WIFI card is not supported in the official jetson image. This is the persistent error I keep getting. Any help would really be appreciated.
error] at fn (/usr/src/app/dist/app.js:10:9736)
[error] at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
[error] Device state apply error Error: Failed to apply state transition steps. (HTTP code 500) server error - failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: unable to apply cgroup configuration: failed to write 13325: write /sys/fs/cgroup/cpuset/system.slice/docker-b876bc18176474ce63dd1b1aafe0b8c79881d42d4631ac884f91620cc66d7023.scope/cgroup.procs: no space left on device: unknown Steps:["start","start","start"]
[error] at fn (/usr/src/app/dist/app.js:10:9736)
[error] at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
This is a really mysterious error and it has something to do with crgroups. I haven’t been able to confirm why everything still works fine on our 8GB RAM Jetson Xavier NX emmc. I don’t know if there’s some drastic hardware difference that may be causing the issue. The thing is it has randomly worked before on the 16GB RAM models and then all of a sudden stopped. This is the df -h of the device:
Hi mpous, thanks for your prompt response. We have ruled out network issues as we are certain our new images get downloaded just fine. I’ve also tried restarting the supervisor and all the basic troubleshooting and nothing has worked. I’ve also reduced our image sizes to fit on the device by making them 100MB. That also didn’t work. When I get to the office I’ll compare the supervisor version on the 8GB RAM jetson that works as well as look into the cgroup files in /sys/fs as that is where the write fails as shown in the error. Sadly I think we may be the first ones to come across this error on Nvidia Jetson Xavier NX. If we discover a solution I’ll make sure you guys know of it.
We managed to solve this one. The issue was related to Jetson Xavier NX 16GB emmc (the 16GB RAM variant was the one we had trouble with). The device basically had its 2 other cpus turned off and only 4 was available this was what was causing the issue for us. We created a custom nvpmodel and appended it to /etc/nvpmodel.conf At the end was:
Note that our containers were requesting six cpus in total and the system only presented 4 before we did the nvpmodel solution. Our assumption had been that we had all six cores available, but unfortunately the default nvpmodel profile only had 4 enabled by default in BelenaOS.