I would like to suggest that balenaOS or balenaCloud be made to serialize its operations on older devices. (I.e. RPi Zero W and B+ v1.2) One thing that kills my progress almost every time I make any kind of update, is that the application containers all stop, download and restart at the same time. On a device with a single-core CPU and poor storage performance, this is a disaster. The amount of device contention pushes everything way past reasonable timeouts and things just crumble and fail, and grind to a stop. The device never recovers from this. If these operations were serialized, so that only one image is being downloaded at a time, and only one service is being restarted at a time, and whatever other operations were queued up, things would go a lot smoother when updates are made. And it would take less time to make changes by eliminating a high level of device contention and context switching.
The devices handle the applications without any problem if I can get them past the issue of redeploying everything all at once every time I make an update.
I tried the depends_on option, but it doesn’t affect things like when I deploy a new build or change configs. It may be helpful as part of the overall solution. But on its own, depends_on was not helping.
Note: depends_on does not wait for db and redis to be “ready” before starting web - only until they have been started. If you need to wait for a service to be ready, see Controlling startup order for more on this problem and strategies for solving it.
The page they link to has some more suggestions for dealing with these issues. Might help?
I added the update strategy kill-then-download and depends_on, and things are incrementally better. Since I also was able to get the lighter build to deploy, it isn’t so much work on the device to deploy anymore. The main issue on my RPi B+ is storage contention while downloading the images/deltas. It takes forever for the deltas to download, because they’re all downloading at once, and fighting for storage IO time. If there’s a way to force the deltas to download in series instead, things would move a lot faster. The Zero W is more stable than the B+, and hasn’t had any issues since I deployed the lighter build.
This is a screen cap of the lighter application restarting on my Zero W. It’s balenaSound without a bunch of the services I don’t want on the device, to save resources. The lighter set of containers has improved the situation a lot - the device doesn’t crash while updating. But they are still all downloading at once, and it can take hours to deploy a new build.
What I’m hoping to hear, or suggest as a feature, is that there are controls that allow me to tell the docker engine to only download X containers/deltas at a time, and only restart Y services at a time. I’m thinking it would be an environment variable that affects resin-supervisor or something like that.
Hi Mark, I’m glad to hear that the suggestions provided at least mitigated the problem to the point it’s workable thanks for the feedback especially around update strategy, I made sure it was passed over to the product team!