So, I think I may have figured out a big issue that was contributing to this problem. I have a mix of older and newer devices, and I had stripped down the application for deployment on the older devices. But I was running the build for that and the regular build into the same application in balenaCloud, and pinning devices to builds. So I had basically two different applications in the same application.
I would always run the bigger build and then the smaller build. Then I’d pin devices. The two builds share some of the containers, but the smaller build omits the container the older devices don’t need or can’t run.
What I think might have been happening is the images/deltas getting weird because the bigger build would have eight containers, and the smaller build only has three. Every update resulted in two builds - a big and a small. So I’m thinking that alternation between big and small builds in the build history was making delta calculations a problem for the build system. And I suspect that was causing the issue I was most recently seeing where the smaller devices would throw 503 and 404 errors in the delta download stage of deployment.
This is all theory, but it might make sense to those who know the build system.
I have moved the older devices to a different balenaCloud application, and only run small builds to that application, and big builds to the original application. Things seem to be a lot happier this way. I even got my most troublesome RPi B+ 1.2 working this way.