Im currently testing some Raspberry Pis (3 x RPi3 and 1xRPi4) as we want to move everything over to Balena.
Everything is fine except for a strange issue where if I push a code/Docker file update (even a small one to a single service) the CPU/Load will become very high. It will move from circa 10% to 50% usage.
The picture shows the base CPU before the push. The increase it after the push and the sudden drop is rebooting the board
I have stopped all of the containers and then restarted them but the base idle CPU is still higher than before the push.
I have also done in the host OS: systemctl stop resin-supervisor balena stop $(balena ps -aq) systemctl stop balena systemctl start balena systemctl start resin-supervisor
And it is the same.
If I actually reboot the device its immediately fixed
This doesnt happen on every box when pushes are done and it wont happen at all sometimes but it is doing it quite regularly, every couple of pushes
All the boxes are running the same code
The Balena Host OS is untouched
Any help would be appreciated as this could be a deal breaker for us
Jeremy
This is quite strange.
Its understandable that there is some cpu spike when the supervisor starts pulling in the new image, stops container, starts the new application.
But then the cpu load should go back to a nominal baseline (if the application didn’t add new load).
Bizarre. What is using the cpu? Is it obvious via the top utility?
We can have a look via support as well.
If you do a simple change, and then you have 1-2 devices that manifest the bug. And 1-2 devices that don’t, you can grant support access to the whole application and we can have a look.
Ideally, we’d like a nice and small Dockerfile for a repeatable test case?
Also, I’d like to know which OS version/image are you using? the latest 64 bit ones?
Hi. That is indeed weird. If you enable support access and send us the UUID of a device exhibiting this behavior we can take a look. I believe /sbin/init is a symlink to systemd, so it’s hard to pin the exact cause without at least looking at the logs.
Support access has been enabled and the UUID is 2d38c1256d85a0340c4dc62acfbb9305
The above box is exhibiting the behaviour (2 are and 2 are not since the last push). The only way to be able to get it out of it is to perform a full reboot
Hi, you could enable delta updates as it is indeed an awesome feature and recommended for production deployments (and will be enabled by default for all new apps soon). I’m unsure from the message above if delta updates were being suggested as a fix or a potential source of the problem but either way if you enable them and see if the problem persists it would eliminate them as a variable.