Hello, I’m having issues with the open balena server. The CPU power is maxing out due to too many postgres processes running at the same time. I disabled openvpn which helped a lot with the CPU power. But it seems like there’s more that. I have about 1100 devices running balena OS and the more we add the more these problem occurs.
What causes the excessive amount of the postgres processes and is there a way to fix this or throttle it?
I think this post is in the wrong category, but nonetheless, I had the same problems.
Just to get more of an idea about how other people run their openbalena instance, on what kind of server(s) do you run it? How many CPU’s / RAM etc?
And which version do you run of openbalena? Or which individual versions do you run of the containers?
I’ve noticed some changes in the newer API and upgraded the database to version 5 with Postgres 13 and it showed improvements. So therefor, I’ve created a PR with upgraded versions, which is merged by now.
Still, 1100 devices is quite something (we’re running about the same), so I’m curious on what kind of servers you are running it.
Dear @ajalal and @bversluijs - I moved the post to the openBalena category and hope that one of our opaneBalena experts will be in touch with you shortly through this post :). Maybe @fisehara can help?
thanks for your request.
First of all, we may need to understand your server setup which you are running the open-balena stack on. Do you use a dedicated database server or do you run everything on one machine?
A general vague answer would be, that the load for the server is too high to process it.
One starting point from my understanding would be limiting the amount of parallel database connections by reducing the
DB_POOL_SIZE down to eg. 20 in the service deployment. This is changing the maximum amount of parallel clients connected to a single database pool. Open-balena-api uses Pine.js for the database connections and uses pg.Pool | node-postgres for establishing the postgres pool connection.
In Pine.js this line will change the behavior: pinejs/db.ts at 2b941057f4ee901464e6212439f5477b9aaacc49 · balena-io/pinejs · GitHub
And in open-balena-api here the envrionment variable will be evaluated:
open-balena-api/index.ts at 276e19c0c805b1e7255d98d42652871f4e6fbda1 · balena-io/open-balena-api · GitHub
This will reduce the database load as not as many request will be processed on the DB in parallel but will cause more 503 responds to the API clients, as not all requests may be served in time.
Please let us know your server setup and your findings when running the latest open-balena version. In addition, you could try to lower the pool size and check the load and responsiveness of the deployment.
Thanks @nmaas87 for sorting the thread into the right category.
Thanks @bversluijs for adding your experience and findings about updated postgres and the server load.
We are running everything on one machine. I was wondering what is the cause of these postgres processes? Is it related to communication with the devices running balenaOS?
Can you share some server specs, like CPU and memory?
The cause for the postgress processes are database accesses that come from the open-balena-api. Every communication from a balenaOS device to the open-balena-api will in the end need a database read. In addition, by default balenaOS devices will report their device state changes (metrics) to the API and the API writes these metrics updates to the database.
Are there any other clients that use and put load on the open-balena-api?
You could also check the device configuration and see if you can apply some settings from the bandwidth reduction documentation: Reduce bandwidth usage - Balena Documentation
You could switch off the metrics report and increase the target state poll interval.
Thank you for the support. I was able to fix it by updating to release 3.6.0
Apparently, older version of openBalena isn’t very optimized as far as sql queries goes.
Thanks for your confirmation that 3.6.0 fixes this issue.
Now I know I wasn’t the only one with the problem and that the update helped.
The openBalena API changed the way how it handled status reports of devices and only saved changes if it was necessary instead of forcing an update. This is probably why (with many devices) the postgres processes dramatically improved since the latest version.