We are using Raspberry Pi Zero 2’s and have been having some issues with release downloads timing out and restarting. I have seen that the general first step is to switch to “kill-then-download” from Fleet update strategy - Balena Documentation . I have set that strategy but when I go to one device and move the pinned release, I don’t see anything in the logs about the container stopping before the download and during the download I see the service status as “Running” in the dashboard. Does the update strategy only apply when the fleet-wide pinned release is changed? Thanks
Here is an excerpt from the logs of an example device. I went to this device, stopped the service from the dashboard and then changed the pinned release of just that device a minute later. If the device were honoring the “kill-then-delete” strategy, I would expect to see the Service is already stopped... message (12:16:42) before the Downloading delta... message (12:11:02).
19.04.22 12:06:45 (-0400) Supervisor starting
19.04.22 12:10:29 (-0400) Killing service 'api sha256:0b6c4dedcea8e4c3bea911f788545713e5041f9455ab66bac1fea1352d8b800b'
19.04.22 12:10:41 (-0400) Killed service 'api sha256:0b6c4dedcea8e4c3bea911f788545713e5041f9455ab66bac1fea1352d8b800b'
19.04.22 12:10:41 (-0400) Service exited 'api sha256:0b6c4dedcea8e4c3bea911f788545713e5041f9455ab66bac1fea1352d8b800b'
19.04.22 12:11:02 (-0400) Downloading delta for image 'registry2.balena-cloud.com/v2/ae176de89a210ac5d6a3ee6410ee764e@sha256:e3e2266c2e60d95fd4aa678265796a5966f0de520ddca5a0babe2171df05e842'
19.04.22 12:11:58 (-0400) Delta still processing remotely. Will retry...
19.04.22 12:11:58 (-0400) Downloading delta for image 'registry2.balena-cloud.com/v2/ae176de89a210ac5d6a3ee6410ee764e@sha256:e3e2266c2e60d95fd4aa678265796a5966f0de520ddca5a0babe2171df05e842'
19.04.22 12:12:54 (-0400) Delta still processing remotely. Will retry...
19.04.22 12:12:54 (-0400) Downloading delta for image 'registry2.balena-cloud.com/v2/ae176de89a210ac5d6a3ee6410ee764e@sha256:e3e2266c2e60d95fd4aa678265796a5966f0de520ddca5a0babe2171df05e842'
19.04.22 12:13:50 (-0400) Delta still processing remotely. Will retry...
19.04.22 12:13:50 (-0400) Downloading delta for image 'registry2.balena-cloud.com/v2/ae176de89a210ac5d6a3ee6410ee764e@sha256:e3e2266c2e60d95fd4aa678265796a5966f0de520ddca5a0babe2171df05e842'
19.04.22 12:16:41 (-0400) Downloaded image 'registry2.balena-cloud.com/v2/ae176de89a210ac5d6a3ee6410ee764e@sha256:e3e2266c2e60d95fd4aa678265796a5966f0de520ddca5a0babe2171df05e842'
19.04.22 12:16:41 (-0400) Killing service 'api sha256:0b6c4dedcea8e4c3bea911f788545713e5041f9455ab66bac1fea1352d8b800b'
19.04.22 12:16:42 (-0400) Service is already stopped, removing container 'api sha256:0b6c4dedcea8e4c3bea911f788545713e5041f9455ab66bac1fea1352d8b800b'
19.04.22 12:16:42 (-0400) Killed service 'api sha256:0b6c4dedcea8e4c3bea911f788545713e5041f9455ab66bac1fea1352d8b800b'
19.04.22 12:16:43 (-0400) Deleting image 'registry2.balena-cloud.com/v2/24e893a57dedff020bc9703403a326b5@sha256:be3721a5e91e3b398961a74b455f08acf79bf49e022cc42e4a55bf602dcf79b3'
19.04.22 12:16:51 (-0400) Deleted image 'registry2.balena-cloud.com/v2/24e893a57dedff020bc9703403a326b5@sha256:be3721a5e91e3b398961a74b455f08acf79bf49e022cc42e4a55bf602dcf79b3'
19.04.22 12:16:52 (-0400) Installing service 'api sha256:3a7cfdf9acc4452a3a50211e5b1b4cdda7a017d6e7501ed11896f23d82c49aa2'
19.04.22 12:16:53 (-0400) Installed service 'api sha256:3a7cfdf9acc4452a3a50211e5b1b4cdda7a017d6e7501ed11896f23d82c49aa2'
19.04.22 12:16:53 (-0400) Starting service 'api sha256:3a7cfdf9acc4452a3a50211e5b1b4cdda7a017d6e7501ed11896f23d82c49aa2'
19.04.22 12:16:55 (-0400) Started service 'api sha256:3a7cfdf9acc4452a3a50211e5b1b4cdda7a017d6e7501ed11896f23d82c49aa2'
@alanb128 The “Delta still processing remotely” messages in the attached logs may have been related to the referenced incident but the bug I am trying to show is unrelated. I do not have downloading issues when the container is not running. When the container is running, the memory and cpu usage of the pi zero 2 is too high and causes issues. I tried to use the “fleet update strategy” variable to stop the container before downloading but that is not working. The call to kill the service comes after the image is downloaded. I tried to show that with the logs by stopping the container manually, changing the release to trigger the process, and then showing that the call to kill the service (at 12:16:41) comes after the download starts (at 12:11:02). Thanks
Here are another set of logs that show the same problem. Luckily, this download went smoothly even though the container was running but the logs still show that the host did not follow the “kill-then-download” strategy. In this case, I changed the fleet-wide pinned release but the host still downloaded the image before killing the running service. The first two log lines are output from the running container. The line at (14:54:04) shows that the next release downloaded started. The call to kill the service isn’t logged until line five (14:56:34), after the image has been downloaded (14:56:33).
27.04.22 14:53:50 (-0400) api ======== Running on http://0.0.0.0:80 ========
27.04.22 14:53:50 (-0400) api (Press CTRL+C to quit)
27.04.22 14:54:04 (-0400) Downloading delta for image 'registry2.balena-cloud.com/v2/d42ffa7e55694652a7e02eb37ab50a82@sha256:bf0124ff6d15aa698a6a4a3ded5f690b2fcbf44cc1e4151995f97aa591ff8a26'
27.04.22 14:56:33 (-0400) Downloaded image 'registry2.balena-cloud.com/v2/d42ffa7e55694652a7e02eb37ab50a82@sha256:bf0124ff6d15aa698a6a4a3ded5f690b2fcbf44cc1e4151995f97aa591ff8a26'
27.04.22 14:56:34 (-0400) Killing service 'api sha256:f739d5f084b14b4e4a022ee1bcc46832b8be7350f25d03d8fc8fa46514927010'
27.04.22 14:56:46 (-0400) Service exited 'api sha256:f739d5f084b14b4e4a022ee1bcc46832b8be7350f25d03d8fc8fa46514927010'
27.04.22 14:56:47 (-0400) Killed service 'api sha256:f739d5f084b14b4e4a022ee1bcc46832b8be7350f25d03d8fc8fa46514927010'
27.04.22 14:56:48 (-0400) Deleting image 'registry2.balena-cloud.com/v2/c8544ad2ec434346a4a8e31d69e213f9@sha256:412ffbd48828a95b21d7a6a65e7469c0bbdcd721e75740beb9eda6876d16c489'
27.04.22 14:57:04 (-0400) Deleted image 'registry2.balena-cloud.com/v2/c8544ad2ec434346a4a8e31d69e213f9@sha256:412ffbd48828a95b21d7a6a65e7469c0bbdcd721e75740beb9eda6876d16c489'
27.04.22 14:57:05 (-0400) Installing service 'api sha256:5d8bfcb99a1e2d655c6c7da2bebecf5aab4a3795cc5b610068e6e465694d2307'
27.04.22 14:57:07 (-0400) Installed service 'api sha256:5d8bfcb99a1e2d655c6c7da2bebecf5aab4a3795cc5b610068e6e465694d2307'
27.04.22 14:57:07 (-0400) Starting service 'api sha256:5d8bfcb99a1e2d655c6c7da2bebecf5aab4a3795cc5b610068e6e465694d2307'
27.04.22 14:57:09 (-0400) Started service 'api sha256:5d8bfcb99a1e2d655c6c7da2bebecf5aab4a3795cc5b610068e6e465694d2307'
Here are the logs from a different device in the same fleet that is having an issue downloading the same release. Both devices started from the same release, are on the same wifi network, and are physically sitting next to each other. This device currently reports memory usage as 362 MB / 411 MB. It seems that stopping the container before download decreases the memory and CPU usage enough that the download happens smoothly.
27.04.22 14:43:21 (-0400) api ======== Running on http://0.0.0.0:80 ========
27.04.22 14:43:21 (-0400) api (Press CTRL+C to quit)
27.04.22 14:47:45 (-0400) Downloading delta for image 'registry2.balena-cloud.com/v2/d42ffa7e55694652a7e02eb37ab50a82@sha256:bf0124ff6d15aa698a6a4a3ded5f690b2fcbf44cc1e4151995f97aa591ff8a26'
27.04.22 14:48:41 (-0400) Delta still processing remotely. Will retry...
27.04.22 14:48:42 (-0400) Downloading delta for image 'registry2.balena-cloud.com/v2/d42ffa7e55694652a7e02eb37ab50a82@sha256:bf0124ff6d15aa698a6a4a3ded5f690b2fcbf44cc1e4151995f97aa591ff8a26'
27.04.22 14:49:38 (-0400) Delta still processing remotely. Will retry...
27.04.22 14:49:39 (-0400) Downloading delta for image 'registry2.balena-cloud.com/v2/d42ffa7e55694652a7e02eb37ab50a82@sha256:bf0124ff6d15aa698a6a4a3ded5f690b2fcbf44cc1e4151995f97aa591ff8a26'
27.04.22 14:55:08 (-0400) Failed to download image 'registry2.balena-cloud.com/v2/d42ffa7e55694652a7e02eb37ab50a82@sha256:bf0124ff6d15aa698a6a4a3ded5f690b2fcbf44cc1e4151995f97aa591ff8a26' due to 'connect ECONNREFUSED /var/run/balena-engine.sock'
27.04.22 14:55:52 (-0400) Supervisor starting
27.04.22 14:55:54 (-0400) Downloading delta for image 'registry2.balena-cloud.com/v2/d42ffa7e55694652a7e02eb37ab50a82@sha256:bf0124ff6d15aa698a6a4a3ded5f690b2fcbf44cc1e4151995f97aa591ff8a26'
27.04.22 14:56:13 (-0400) Supervisor starting
27.04.22 14:56:15 (-0400) Downloading delta for image 'registry2.balena-cloud.com/v2/d42ffa7e55694652a7e02eb37ab50a82@sha256:bf0124ff6d15aa698a6a4a3ded5f690b2fcbf44cc1e4151995f97aa591ff8a26'
27.04.22 15:02:42 (-0400) Failed to download image 'registry2.balena-cloud.com/v2/d42ffa7e55694652a7e02eb37ab50a82@sha256:bf0124ff6d15aa698a6a4a3ded5f690b2fcbf44cc1e4151995f97aa591ff8a26' due to 'connect ECONNREFUSED /var/run/balena-engine.sock'
27.04.22 15:03:23 (-0400) Supervisor starting
27.04.22 15:03:24 (-0400) Downloading delta for image 'registry2.balena-cloud.com/v2/d42ffa7e55694652a7e02eb37ab50a82@sha256:bf0124ff6d15aa698a6a4a3ded5f690b2fcbf44cc1e4151995f97aa591ff8a26'
27.04.22 15:03:46 (-0400) Supervisor starting
27.04.22 15:03:47 (-0400) Downloading delta for image 'registry2.balena-cloud.com/v2/d42ffa7e55694652a7e02eb37ab50a82@sha256:bf0124ff6d15aa698a6a4a3ded5f690b2fcbf44cc1e4151995f97aa591ff8a26'
27.04.22 15:10:01 (-0400) Failed to download image 'registry2.balena-cloud.com/v2/d42ffa7e55694652a7e02eb37ab50a82@sha256:bf0124ff6d15aa698a6a4a3ded5f690b2fcbf44cc1e4151995f97aa591ff8a26' due to 'connect ECONNREFUSED /var/run/balena-engine.sock'
27.04.22 15:10:43 (-0400) Supervisor starting
27.04.22 15:10:45 (-0400) Downloading delta for image 'registry2.balena-cloud.com/v2/d42ffa7e55694652a7e02eb37ab50a82@sha256:bf0124ff6d15aa698a6a4a3ded5f690b2fcbf44cc1e4151995f97aa591ff8a26'
27.04.22 15:11:03 (-0400) Supervisor starting
27.04.22 15:11:05 (-0400) Downloading delta for image 'registry2.balena-cloud.com/v2/d42ffa7e55694652a7e02eb37ab50a82@sha256:bf0124ff6d15aa698a6a4a3ded5f690b2fcbf44cc1e4151995f97aa591ff8a26'
This is actually just a documentation problem. I found an issue in the balena-supervisor Github that referenced this and the steps to reproduce said to use RESIN_SUPERVISOR_UPDATE_STRATEGY. When I set RESIN_SUPERVISOR_UPDATE_STRATEGY=kill-then-download, the supervisor kills the service before beginning the download as expected. In the Balena documentation page for fleet update strategy that was mentioned in the original post, the references to configuration BALENA_SUPERVISOR_UPDATE_STRATEGY should be changed to RESIN_SUPERVISOR_UPDATE_STRATEGY. For completeness, here are the logs of the device with the proper configuration variable name set which show the “Killing service” message before the “Downloading image” message.
27.04.22 16:08:28 (-0400) api ======== Running on http://0.0.0.0:80 ========
27.04.22 16:08:28 (-0400) api (Press CTRL+C to quit)
27.04.22 16:08:39 (-0400) Killing service 'api sha256:d07224f598b3340b4d8aff35a5cf703fd4cc2b62eccc4747f582664ace3cd802'
27.04.22 16:08:51 (-0400) Service exited 'api sha256:d07224f598b3340b4d8aff35a5cf703fd4cc2b62eccc4747f582664ace3cd802'
27.04.22 16:08:51 (-0400) Killed service 'api sha256:d07224f598b3340b4d8aff35a5cf703fd4cc2b62eccc4747f582664ace3cd802'
27.04.22 16:08:52 (-0400) Downloading delta for image 'registry2.balena-cloud.com/v2/d42ffa7e55694652a7e02eb37ab50a82@sha256:bf0124ff6d15aa698a6a4a3ded5f690b2fcbf44cc1e4151995f97aa591ff8a26'
27.04.22 16:10:48 (-0400) Downloaded image 'registry2.balena-cloud.com/v2/d42ffa7e55694652a7e02eb37ab50a82@sha256:bf0124ff6d15aa698a6a4a3ded5f690b2fcbf44cc1e4151995f97aa591ff8a26'
27.04.22 16:10:50 (-0400) Deleting image 'registry2.balena-cloud.com/v2/6124c6e4e354be2d11ea065ee8123bb6@sha256:5b4489134e8b241cf2653e739b8cac7120f713ea7d274fd47b46a3c801f070df'
27.04.22 16:10:56 (-0400) Deleted image 'registry2.balena-cloud.com/v2/6124c6e4e354be2d11ea065ee8123bb6@sha256:5b4489134e8b241cf2653e739b8cac7120f713ea7d274fd47b46a3c801f070df'
27.04.22 16:10:57 (-0400) Installing service 'api sha256:5d8bfcb99a1e2d655c6c7da2bebecf5aab4a3795cc5b610068e6e465694d2307'
27.04.22 16:10:58 (-0400) Installed service 'api sha256:5d8bfcb99a1e2d655c6c7da2bebecf5aab4a3795cc5b610068e6e465694d2307'
27.04.22 16:10:58 (-0400) Starting service 'api sha256:5d8bfcb99a1e2d655c6c7da2bebecf5aab4a3795cc5b610068e6e465694d2307'
27.04.22 16:11:00 (-0400) Started service 'api sha256:5d8bfcb99a1e2d655c6c7da2bebecf5aab4a3795cc5b610068e6e465694d2307'
Hello,
Thanks for your continued investigation and feedback on this issue. It is very valuable. The update strategy documentation at the moment doesn’t clearly reference which strategy works with which supervisor version correctly. I feel like that’s what’s happening here. In your case, the configuration option BALENA_SUPERVISOR_UPDATE_STRATEGY isn’t probably supported by the supervisor running on the device. Please tell us the balenaOS and Supervisor version that is running on the Pi zero 2 where you ran these tests. Since this a newly released device, it’s weird that SV is having this issue at all.
Regardless, the intention of the documentation is correct to refer to BALENA_SUPERVISOR_UPDATE_STRATEGY everywhere since that’s what we intend to support long term. Actions I intend to take ahead:
Open a GitHub issue on the SV repo similar to one you referenced.
Check if we can add some version information to the docs to better inform users of what configuration they should apply.
I apologize for the inconvenience you had to face and we take feedback very seriously. Rest assured, we will get this sorted and inform you as the issue progresses. Please do let us know if there’s something else we can help you with. Thanks again.
That makes sense. Sorry I should’ve sent the balenaOS and supervisor versions in the original post. All devices are Raspberry Pi Zero 2 W running balenaOS 2.94.4 development with supervisor version 13.0.0. Most were created on 2022-03-08. Thanks for the responses!
From what I followed on the GitHub issue, I believe that using the new, per-container update strategy has solved your problem, right? Please let us know if there’s still something we can do for you with regards to this issue!
For future visitors of this thread: the current preferred approach to configure the update strategy is by setting the io.balena.update.strategy label on the desired services. The older mechanism (based on configuration variables) is deprecated. We have updated our docs with this information: Fleet update strategy - Balena Documentation