Device Information:
- Model: Nvidia Jetson Orin NX 16GB
- Setup: Xavier NX Devkit with NVME
- HOST OS: balenaOS
- Version: 3.2.5
- Supervisor Version: 14.12.0
- Application Running: Lumeo v1.16.26
Issue Description: After successfully starting and booting the OS, the device seems to operate as expected. It appears online on balenaOS cloud dashborad, and I can smoothly access both the host terminal and the Lumeo terminal. However, after roughly 4 hours of consistent operation, the device unexpectedly goes offline. Despite this, the device remains powered on, and all diagnostics indicate that it is functioning normally.
Temporary Solution Attempted: To temporarily resolve this issue and bring the device back online, I’ve tried unplugging the power cable and then plugging it back in. This method seems to work, but it’s not a long-term solution.
Additional Information: No significant changes or modifications were made to the device or software before this issue started occurring. The environmental factors, like temperature and humidity, are within standard operating ranges.
I am including device diagnostics output in case it helps!
0f868c45c0e3d0868b83fb96f5de8c26_diagnostics_2023.09.21_00.03.40+0000.txt (554.4 KB)
Also adding supervisor state:
{
“api_port”: 48484,
“ip_address”: “ADDRESS”,
“os_version”: “balenaOS 3.2.5+rev2”,
“mac_address”: “ADDRESS”,
“supervisor_version”: “14.12.0”,
“update_pending”: false,
“update_failed”: false,
“update_downloaded”: false,
“commit”: “a16e2c00bfdd3f1c0fd4c9dc7a5ef30f”,
“status”: “Idle”,
“download_progress”: null
}
Finally device health checks:
{“diagnose_version”:“4.22.13”,“checks”:[{“name”:“check_balenaOS”,“success”:true,“status”:“Supported balenaOS 2.x detected”},{“name”:“check_container_engine”,“success”:true,“status”:“No container_engine issues detected”},{“name”:“check_localdisk”,“success”:true,“status”:“No localdisk issues detected”},{“name”:“check_memory”,“success”:true,“status”:“93% memory available”},{“name”:“check_networking”,“success”:true,“status”:“No networking issues detected”},{“name”:“check_os_rollback”,“success”:true,“status”:“No OS rollbacks detected”},{“name”:“check_supervisor”,“success”:true,“status”:“Supervisor is running & healthy”},{“name”:“check_temperature”,“success”:true,“status”:“No temperature issues detected”},{“name”:“check_timesync”,“success”:true,“status”:“Time is synchronized”},{“name”:“check_service_lumeo”,“success”:true,“status”:“User service ‘lumeo’ is running & healthy”}]}
I’d appreciate any insights or solutions that the community might have to address this problem. Has anyone else experienced a similar issue?