Hello everyone,
I’m encountering a seemingly “random” problem with the Balena Engine when one of my services crashes and needs to be started again.
Most of the time, this behaves as expected on the majority of my devices. However, on certains occasions, the service is stuck and cannot restart.
In those occasions, there are several errors that appears in the logs of the Balena Engine:
[e[34minfoe[39m] Applying target state
[e[35mdebuge[39m] Found unmanaged Volume: b8b019f912b56107e4dfe23ac896ac0b56599dd08ce14bb6ea899584698438c7
[e[35mdebuge[39m] Found unmanaged Volume: e5857af62c6d1f458ae212f7eb76bda4635e4c2fb3b2d9acf1b0808745de8132
[e[33mwarne[39m] Ignoring unsupported or unknown compose fields: containerName
[e[35mdebuge[39m] Found unmanaged Volume: b8b019f912b56107e4dfe23ac896ac0b56599dd08ce14bb6ea899584698438c7
[e[35mdebuge[39m] Found unmanaged Volume: e5857af62c6d1f458ae212f7eb76bda4635e4c2fb3b2d9acf1b0808745de8132
[e[36mevente[39m] Event: Service start {"service":{"appId":1835016,"serviceId":1083089,"serviceName":"cloud-connector","commit":"662af403dd1b8e1f1ef8f75cdf90ee3c","releaseId":1942744}}
[e[31merrore[39m] Scheduling another update attempt in 1800000ms due to failure: Error: Failed to apply state transition steps. (HTTP code 500) server error - task 71b657d018ca3d28e9ef3079f593eae207ed414a9ec0ab60c2f0b294bfd51158 already exists: unknown Steps:["start"]
[e[31merrore[39m] at fn (/usr/src/app/dist/app.js:6:8594)
[e[31merrore[39m] at runMicrotasks (<anonymous>)
[e[31merrore[39m] at processTicksAndRejections (internal/process/task_queues.js:97:5)
[e[31merrore[39m] Device state apply error Error: Failed to apply state transition steps. (HTTP code 500) server error - task 71b657d018ca3d28e9ef3079f593eae207ed414a9ec0ab60c2f0b294bfd51158 already exists: unknown Steps:["start"]
[e[31merrore[39m] at fn (/usr/src/app/dist/app.js:6:8594)
[e[31merrore[39m] at runMicrotasks (<anonymous>)
[e[31merrore[39m] at processTicksAndRejections (internal/process/task_queues.js:97:5)
Clicking on the start button on BalenaCloud does not resolve the issue, but calling the Balena Supervisor API through /restart-service
endpoint resolves the issue.
I am not able to reproduce voluntarily the problem yet, so I don’t think I’ll be able to provide support access on one of my devices for further investigation, but I have a diagnostic report from one of the devices which caused the problem :
215508c94253ee8609a796b391e816cc46b285334f7f1d31a928b3e92795e4_diagnostics_2021.12.14_14.43.58+0000.txt (2.7 MB)
Balena Supervisor version on my devices : 12.10.3
I would be glad if someone could help me on this matter, as it is critical for me to have this service up at all time.
Thanks in advance,
Christopher