tl;dr: init: true
== bad(!). When defined on a service in the compose file, and the built release was deployed, Balena Supervisor started and immediately killed the service (sending sigterm), continuously. When you’re in this state, current release will still point to whichever release you updated from—no other indication is given. If faced with similar issues, double-check if init: true
is defined on the service in question.
Background
I’m writing this to hopefully save someone else a lot of pain debugging this issue as our searches for this problem turned up short. There were no indications of anything going wrong:
- Everything worked locally.
- Pushing and building the release showed no errors.
- Deploying (pinning) the release to device showed no errors for any other services—all which behaved as expected.
The service even started as expected – as seen by out program logs – but then it also logged “received SIGTERM”, and preceding to gracefully shut down accordingly—this was also part of our logging (and was correct). Balena Supervisor logs also showed that within two seconds, after starting the service, it immediately sent to kill it. This went on, and on, and on.
2024-09-06T15:10:08+02:00 Started service ‘<service name> sha256:<hash>’
2024-09-06T15:10:09+02:00 Killing service ‘<service name> sha256:<hash>’
We tried updated Balena Supervisor and BalenaOS, even though the previous versions weren’t marked as having any errors: no difference.
We tried building a new release with just a minor difference, in case the bug was in builders: same story.
We tried changing the name of the service (long shot, but quick rule out).
Since the service obviously couldn’t ssh/exec into the image to debug, I tried manually running the image from the host:
balena run --init [+relevant envs, volumes etc] <image sha>
No issues, whatsoever—this was solely an issue with Balena Supervisor it seems. And that was further confirmed by only changing the imaged used for this service, referencing an image of the other working services (that wouldn’t conflict due to env-vars, volumes etc.)—that too misbehaved. Checking the differences left only init: true
and removing that made everything work as expected.
After figuring all this out and checking the documentation makes no mention of the init
property, either supported or unsupported. But the engine itself obviously has no issues with it.
Suggested improvements
It would be great to have the balena push
command throw an error if init
is included – just like it errors for bind mounts for example – to catch it as early as possible. And an update to the docs, wouldn’t hurt too.
Another place for improvements would be to show some sort of error in the dashboard or similar if a release doesn’t settle within a reasonable timeframe—just in case another bug comes along making the update not complete.
But in the meantime, I hope this will show up and spare someone encountering this a great deal of headache and time.