Extreme loads and High CPU

Hi all,

We’re running an openBalena instance with hundreds of devices. Not every device is online, let’s say 30% of the devices are online.

Now, since last saturday, we’re experiencing extreme loads and high CPU loads. The CPU is 100% all of the time and loads of 50+. I can’t really get all logs, because of those loads. Nothing has changed the last months on this instance, so it really comes out of the blue.

When stopping the API instance, the load (and thus CPU) drops to a normal level. But, looking at the logs, it looks like the database is having some problems handling all requests. When using top in the database container (when possible), it shows many postgres services, indicating that there are many query’s being run.

I have some logs from the API and the database and I’d like to share them with the openBalena team, hoping they can give me some feedback about what to do. I see many errors, but all are indicating that something goes wrong in the database itself.


Some information about the running instance:

API version: 0.139.0
DB version: 4.1.0
VPN version: v9.17.11

Server is a 4GB RAM, 2 CPU. It’s setup to scale, but we don’t know if we have to do it vertically or horizontally or both. Or if something else is screwing everything up.


Thanks in advance for everyone looking into it!


Update
I’ve placed the database on a new server, which is dedicated for the database only. However, it seems like there are bursts of queries ran by the API instance. So it’ll run just fine and then a burst of queries are visible. The logs of the API don’t really show which are being executed (other than device patches on /state, but that’s continuously the case).

I see this error popping up sometimes, it seems to cause some problems:

Jan 17 16:49:40 openbalena-api-78fc9b765-jxg8l api[1062]: Failed to get device type build data for via-vab802-quad/2.0.0-beta.1 The specified key does not exist. NoSuchKey: The specified key does not exist.
Jan 17 16:49:40 openbalena-api-78fc9b765-jxg8l api[1062]:     at Request.extractError (/usr/src/app/node_modules/aws-sdk/lib/services/s3.js:718:35)
Jan 17 16:49:40 openbalena-api-78fc9b765-jxg8l api[1062]:     at Request.callListeners (/usr/src/app/node_modules/aws-sdk/lib/sequential_executor.js:106:20)
Jan 17 16:49:40 openbalena-api-78fc9b765-jxg8l api[1062]:     at Request.emit (/usr/src/app/node_modules/aws-sdk/lib/sequential_executor.js:78:10)
Jan 17 16:49:40 openbalena-api-78fc9b765-jxg8l api[1062]:     at Request.emit (/usr/src/app/node_modules/aws-sdk/lib/request.js:688:14)
Jan 17 16:49:40 openbalena-api-78fc9b765-jxg8l api[1062]:     at Request.transition (/usr/src/app/node_modules/aws-sdk/lib/request.js:22:10)
Jan 17 16:49:40 openbalena-api-78fc9b765-jxg8l api[1062]:     at AcceptorStateMachine.runTo (/usr/src/app/node_modules/aws-sdk/lib/state_machine.js:14:12)
Jan 17 16:49:40 openbalena-api-78fc9b765-jxg8l api[1062]:     at /usr/src/app/node_modules/aws-sdk/lib/state_machine.js:26:10
Jan 17 16:49:40 openbalena-api-78fc9b765-jxg8l api[1062]:     at Request.<anonymous> (/usr/src/app/node_modules/aws-sdk/lib/request.js:38:9)
Jan 17 16:49:40 openbalena-api-78fc9b765-jxg8l api[1062]:     at Request.<anonymous> (/usr/src/app/node_modules/aws-sdk/lib/request.js:690:12)
Jan 17 16:49:40 openbalena-api-78fc9b765-jxg8l api[1062]:     at Request.callListeners (/usr/src/app/node_modules/aws-sdk/lib/sequential_executor.js:116:18)
Jan 17 16:49:40 openbalena-api-78fc9b765-jxg8l api[1062]:     at Request.emit (/usr/src/app/node_modules/aws-sdk/lib/sequential_executor.js:78:10)
Jan 17 16:49:40 openbalena-api-78fc9b765-jxg8l api[1062]:     at Request.emit (/usr/src/app/node_modules/aws-sdk/lib/request.js:688:14)
Jan 17 16:49:40 openbalena-api-78fc9b765-jxg8l api[1062]:     at Request.transition (/usr/src/app/node_modules/aws-sdk/lib/request.js:22:10)
Jan 17 16:49:40 openbalena-api-78fc9b765-jxg8l api[1062]:     at AcceptorStateMachine.runTo (/usr/src/app/node_modules/aws-sdk/lib/state_machine.js:14:12)
Jan 17 16:49:40 openbalena-api-78fc9b765-jxg8l api[1062]:     at /usr/src/app/node_modules/aws-sdk/lib/state_machine.js:26:10
Jan 17 16:49:40 openbalena-api-78fc9b765-jxg8l api[1062]:     at Request.<anonymous> (/usr/src/app/node_modules/aws-sdk/lib/request.js:38:9)
Jan 17 16:49:40 openbalena-api-78fc9b765-jxg8l api[1062]:     at Request.<anonymous> (/usr/src/app/node_modules/aws-sdk/lib/request.js:690:12)
Jan 17 16:49:40 openbalena-api-78fc9b765-jxg8l api[1062]:     at Request.callListeners (/usr/src/app/node_modules/aws-sdk/lib/sequential_executor.js:116:18)
Jan 17 16:49:40 openbalena-api-78fc9b765-jxg8l api[1062]:     at callNextListener (/usr/src/app/node_modules/aws-sdk/lib/sequential_executor.js:96:12)
Jan 17 16:49:40 openbalena-api-78fc9b765-jxg8l api[1062]:     at IncomingMessage.onEnd (/usr/src/app/node_modules/aws-sdk/lib/event_listeners.js:313:13)
Jan 17 16:49:40 openbalena-api-78fc9b765-jxg8l api[1062]:     at IncomingMessage.emit (events.js:327:22)
Jan 17 16:49:40 openbalena-api-78fc9b765-jxg8l api[1062]:     at IncomingMessage.EventEmitter.emit (domain.js:467:12)
Jan 17 16:49:40 openbalena-api-78fc9b765-jxg8l api[1062]:     at endReadableNT (internal/streams/readable.js:1327:12)
Jan 17 16:49:40 openbalena-api-78fc9b765-jxg8l api[1062]:     at processTicksAndRejections (internal/process/task_queues.js:80:21)

And the following errors:

Jan 17 16:50:02 openbalena-api-78fc9b765-jxg8l api[1062]: Failed to get contract refer id for field is_of__cpu_architecture of resource cpu_architecture.
Jan 17 16:50:02 openbalena-api-78fc9b765-jxg8l api[1062]: Failed to get contract refer id for field is_of__cpu_architecture of resource cpu_architecture.
Jan 17 16:50:02 openbalena-api-78fc9b765-jxg8l api[1062]: Failed to get contract refer id for field is_of__cpu_architecture of resource cpu_architecture.
Jan 17 16:50:02 openbalena-api-78fc9b765-jxg8l api[1062]: Failed to get contract refer id for field is_of__cpu_architecture of resource cpu_architecture.
Jan 17 16:50:02 openbalena-api-78fc9b765-jxg8l api[1062]: Failed to get contract refer id for field is_of__cpu_architecture of resource cpu_architecture.
Jan 17 16:50:02 openbalena-api-78fc9b765-jxg8l api[1062]: Failed to get contract refer id for field is_of__cpu_architecture of resource cpu_architecture.
Jan 17 16:50:02 openbalena-api-78fc9b765-jxg8l api[1062]: Failed to get contract refer id for field is_of__cpu_architecture of resource cpu_architecture.
Jan 17 16:50:02 openbalena-api-78fc9b765-jxg8l api[1062]: Failed to get contract refer id for field is_of__cpu_architecture of resource cpu_architecture.
Jan 17 16:50:02 openbalena-api-78fc9b765-jxg8l api[1062]: Failed to get contract refer id for field is_of__cpu_architecture of resource cpu_architecture.
Jan 17 16:50:02 openbalena-api-78fc9b765-jxg8l api[1062]: Failed to get contract refer id for field is_of__cpu_architecture of resource cpu_architecture.
Jan 17 16:50:02 openbalena-api-78fc9b765-jxg8l api[1062]: Failed to get contract refer id for field is_of__cpu_architecture of resource cpu_architecture.
Jan 17 16:50:02 openbalena-api-78fc9b765-jxg8l api[1062]: Failed to get contract refer id for field is_of__cpu_architecture of resource cpu_architecture.
Jan 17 16:50:02 openbalena-api-78fc9b765-jxg8l api[1062]: Failed to get contract refer id for field is_of__cpu_architecture of resource cpu_architecture.
Jan 17 16:50:02 openbalena-api-78fc9b765-jxg8l api[1062]: Failed to get contract refer id for field is_of__cpu_architecture of resource cpu_architecture.
Jan 17 16:50:02 openbalena-api-78fc9b765-jxg8l api[1062]: Failed to get contract refer id for field is_of__cpu_architecture of resource cpu_architecture.
Jan 17 16:50:02 openbalena-api-78fc9b765-jxg8l api[1062]: Failed to get contract refer id for field is_of__cpu_architecture of resource cpu_architecture.
Jan 17 16:50:02 openbalena-api-78fc9b765-jxg8l api[1062]: Failed to get contract refer id for field is_of__cpu_architecture of resource cpu_architecture.
Jan 17 16:50:02 openbalena-api-78fc9b765-jxg8l api[1062]: Failed to get contract refer id for field is_of__cpu_architecture of resource cpu_architecture.
Jan 17 16:50:02 openbalena-api-78fc9b765-jxg8l api[1062]: Failed to get contract refer id for field is_of__cpu_architecture of resource cpu_architecture.

I also see these errors, when the database is under high load:

Jan 17 16:50:02 openbalena-api-78fc9b765-jxg8l api[1062]: Error loading api key permissions InternalRequestError:
Jan 17 16:50:02 openbalena-api-78fc9b765-jxg8l api[1062]:     at convertToHttpError (/usr/src/app/node_modules/@balena/pinejs/src/sbvr-api/sbvr-utils.ts:1180:10)
Jan 17 16:50:02 openbalena-api-78fc9b765-jxg8l api[1062]:     at /usr/src/app/node_modules/@balena/pinejs/src/sbvr-api/sbvr-utils.ts:1069:13
Jan 17 16:50:02 openbalena-api-78fc9b765-jxg8l api[1062]:     at Array.map (<anonymous>)
Jan 17 16:50:02 openbalena-api-78fc9b765-jxg8l api[1062]:     at /usr/src/app/node_modules/@balena/pinejs/src/sbvr-api/sbvr-utils.ts:1067:30
Jan 17 16:50:02 openbalena-api-78fc9b765-jxg8l api[1062]:     at runMicrotasks (<anonymous>)
Jan 17 16:50:02 openbalena-api-78fc9b765-jxg8l api[1062]:     at processTicksAndRejections (internal/process/task_queues.js:93:5)
Jan 17 16:50:02 openbalena-api-78fc9b765-jxg8l api[1062]:     at Object.runURI (/usr/src/app/node_modules/@balena/pinejs/src/sbvr-api/sbvr-utils.ts:850:21)
Jan 17 16:50:02 openbalena-api-78fc9b765-jxg8l api[1062]:     at PinejsClient._request (/usr/src/app/node_modules/@balena/pinejs/src/sbvr-api/sbvr-utils.ts:786:11)
Jan 17 16:50:02 openbalena-api-78fc9b765-jxg8l api[1062]:     at /usr/src/app/node_modules/pinejs-client-core/src/index.ts:1181:19
Jan 17 16:50:02 openbalena-api-78fc9b765-jxg8l api[1062]:     at Object.getApiKeyPermissions (/usr/src/app/node_modules/@balena/pinejs/src/sbvr-api/permissions.ts:1240:24)
Jan 17 16:50:02 openbalena-api-78fc9b765-jxg8l api[1062]:     at checkApiKey (/usr/src/app/node_modules/@balena/pinejs/src/sbvr-api/permissions.ts:1290:17)
Jan 17 16:50:02 openbalena-api-78fc9b765-jxg8l api[1062]:     at Object.resolveAuthHeader (/usr/src/app/node_modules/@balena/pinejs/src/sbvr-api/permissions.ts:1329:9)
Jan 17 16:50:02 openbalena-api-78fc9b765-jxg8l api[1062]:     at Object.getAPIKey (/usr/src/app/src/infra/auth/api-keys.ts:27:10)
Jan 17 16:50:02 openbalena-api-78fc9b765-jxg8l api[1062]:     at apiKeyMiddleware (/usr/src/app/src/infra/auth/middleware.ts:58:17) {
Jan 17 16:50:02 openbalena-api-78fc9b765-jxg8l api[1062]:   status: 500,
Jan 17 16:50:02 openbalena-api-78fc9b765-jxg8l api[1062]:   body: undefined
Jan 17 16:50:02 openbalena-api-78fc9b765-jxg8l api[1062]: }
Jan 17 16:50:02 openbalena-api-78fc9b765-jxg8l api[1062]: Error with API key: InternalRequestError:
Jan 17 16:50:02 openbalena-api-78fc9b765-jxg8l api[1062]:     at convertToHttpError (/usr/src/app/node_modules/@balena/pinejs/src/sbvr-api/sbvr-utils.ts:1180:10)
Jan 17 16:50:02 openbalena-api-78fc9b765-jxg8l api[1062]:     at /usr/src/app/node_modules/@balena/pinejs/src/sbvr-api/sbvr-utils.ts:1069:13
Jan 17 16:50:02 openbalena-api-78fc9b765-jxg8l api[1062]:     at Array.map (<anonymous>)
Jan 17 16:50:02 openbalena-api-78fc9b765-jxg8l api[1062]:     at /usr/src/app/node_modules/@balena/pinejs/src/sbvr-api/sbvr-utils.ts:1067:30
Jan 17 16:50:02 openbalena-api-78fc9b765-jxg8l api[1062]:     at runMicrotasks (<anonymous>)
Jan 17 16:50:02 openbalena-api-78fc9b765-jxg8l api[1062]:     at processTicksAndRejections (internal/process/task_queues.js:93:5)
Jan 17 16:50:02 openbalena-api-78fc9b765-jxg8l api[1062]:     at Object.runURI (/usr/src/app/node_modules/@balena/pinejs/src/sbvr-api/sbvr-utils.ts:850:21)
Jan 17 16:50:02 openbalena-api-78fc9b765-jxg8l api[1062]:     at PinejsClient._request (/usr/src/app/node_modules/@balena/pinejs/src/sbvr-api/sbvr-utils.ts:786:11)
Jan 17 16:50:02 openbalena-api-78fc9b765-jxg8l api[1062]:     at /usr/src/app/node_modules/pinejs-client-core/src/index.ts:1181:19
Jan 17 16:50:02 openbalena-api-78fc9b765-jxg8l api[1062]:     at Object.getApiKeyPermissions (/usr/src/app/node_modules/@balena/pinejs/src/sbvr-api/permissions.ts:1240:24)
Jan 17 16:50:02 openbalena-api-78fc9b765-jxg8l api[1062]:     at checkApiKey (/usr/src/app/node_modules/@balena/pinejs/src/sbvr-api/permissions.ts:1290:17)
Jan 17 16:50:02 openbalena-api-78fc9b765-jxg8l api[1062]:     at Object.resolveAuthHeader (/usr/src/app/node_modules/@balena/pinejs/src/sbvr-api/permissions.ts:1329:9)
Jan 17 16:50:02 openbalena-api-78fc9b765-jxg8l api[1062]:     at Object.getAPIKey (/usr/src/app/src/infra/auth/api-keys.ts:27:10)
Jan 17 16:50:02 openbalena-api-78fc9b765-jxg8l api[1062]:     at apiKeyMiddleware (/usr/src/app/src/infra/auth/middleware.ts:58:17) {
Jan 17 16:50:02 openbalena-api-78fc9b765-jxg8l api[1062]:   status: 500,
Jan 17 16:50:02 openbalena-api-78fc9b765-jxg8l api[1062]:   body: undefined
Jan 17 16:50:02 openbalena-api-78fc9b765-jxg8l api[1062]: }

It happened again, and it looks like the database is running out of memory every time, as shown in the DB logs:

2022-02-21 20:04:17.312 UTC [1098] FATAL:  sorry, too many clients already
2022-02-21 20:04:17.312 UTC [1099] FATAL:  sorry, too many clients already
2022-02-21 20:04:17.313 UTC [1096] FATAL:  sorry, too many clients already
2022-02-21 20:04:17.314 UTC [1097] FATAL:  sorry, too many clients already
2022-02-21 20:04:17.314 UTC [1100] FATAL:  sorry, too many clients already
2022-02-21 20:04:17.324 UTC [1101] FATAL:  sorry, too many clients already
2022-02-21 20:04:17.332 UTC [1102] FATAL:  sorry, too many clients already
2022-02-21 20:04:17.340 UTC [1103] FATAL:  sorry, too many clients already
2022-02-21 20:04:17.344 UTC [1104] FATAL:  sorry, too many clients already
2022-02-21 20:04:17.368 UTC [1105] FATAL:  sorry, too many clients already
2022-02-21 20:04:17.401 UTC [1095] FATAL:  sorry, too many clients already
2022-02-21 20:04:17.547 UTC [1107] FATAL:  sorry, too many clients already
2022-02-21 20:04:17.612 UTC [1106] FATAL:  sorry, too many clients already
2022-02-21 20:04:17.636 UTC [1109] FATAL:  sorry, too many clients already
2022-02-21 20:04:17.636 UTC [1108] FATAL:  sorry, too many clients already
2022-02-21 20:04:17.637 UTC [1110] FATAL:  sorry, too many clients already
2022-02-21 20:04:21.622 UTC [1] LOG:  server process (PID 880) was terminated by signal 9: Killed
2022-02-21 20:04:21.622 UTC [1] DETAIL:  Failed process was running: SELECT (
                SELECT coalesce(array_to_json(array_agg("device.device config variable".*)), '[]') AS "device_config_variable"
                FROM (
                        SELECT "device.device config variable"."name", "device.device config variable"."value"
                        FROM "device config variable" AS "device.device config variable"
                        WHERE "device"."id" = "device.device config variable"."device"
                        ORDER BY "device.device config variable"."name" ASC
                ) AS "device.device config variable"
        ) AS "device_config_variable", (
                SELECT coalesce(array_to_json(array_agg("device.device environment variable".*)), '[]') AS "device_environment_variable"
                FROM (
                        SELECT "device.device environment variable"."name", "device.device environment variable"."value"
                        FROM "device environment variable" AS "device.device environment variable"
                        WHERE "device"."id" = "device.device environment variable"."device"
                ) AS "device.device environment variable"
        ) AS "device_environment_variable", (
                SELECT coalesce(array_to_json(array_agg("device.should be running-release".*)), '[]') AS "
2022-02-21 20:04:21.624 UTC [1] LOG:  terminating any other active server processes
2022-02-21 20:04:21.624 UTC [1092] WARNING:  terminating connection because of crash of another server process
2022-02-21 20:04:21.631 UTC [1073] WARNING:  terminating connection because of crash of another server process
2022-02-21 20:04:21.631 UTC [1073] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-02-21 20:04:21.631 UTC [1073] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-02-21 20:04:21.632 UTC [1059] WARNING:  terminating connection because of crash of another server process
2022-02-21 20:04:21.632 UTC [1059] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-02-21 20:04:21.632 UTC [1059] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-02-21 20:04:21.635 UTC [1050] WARNING:  terminating connection because of crash of another server process
2022-02-21 20:04:21.635 UTC [1050] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-02-21 20:04:21.635 UTC [1050] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-02-21 20:04:21.641 UTC [1056] WARNING:  terminating connection because of crash of another server process
2022-02-21 20:04:21.641 UTC [1056] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-02-21 20:04:21.641 UTC [1056] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-02-21 20:04:21.642 UTC [1069] WARNING:  terminating connection because of crash of another server process
2022-02-21 20:04:21.642 UTC [1069] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-02-21 20:04:21.642 UTC [1069] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-02-21 20:04:21.675 UTC [866] WARNING:  terminating connection because of crash of another server process
2022-02-21 20:04:21.675 UTC [866] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-02-21 20:04:21.675 UTC [866] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-02-21 20:04:21.678 UTC [1091] WARNING:  terminating connection because of crash of another server process
2022-02-21 20:04:21.678 UTC [1091] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-02-21 20:04:21.678 UTC [1091] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-02-21 20:04:21.684 UTC [1083] WARNING:  terminating connection because of crash of another server process
2022-02-21 20:04:21.684 UTC [1083] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-02-21 20:04:21.684 UTC [1083] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-02-21 20:04:21.687 UTC [1064] WARNING:  terminating connection because of crash of another server process
2022-02-21 20:04:21.687 UTC [1064] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-02-21 20:04:21.687 UTC [1064] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-02-21 20:04:21.687 UTC [1090] WARNING:  terminating connection because of crash of another server process
2022-02-21 20:04:21.687 UTC [1090] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-02-21 20:04:21.687 UTC [1090] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-02-21 20:04:21.695 UTC [1051] WARNING:  terminating connection because of crash of another server process
2022-02-21 20:04:21.695 UTC [1051] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-02-21 20:04:21.695 UTC [1051] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-02-21 20:04:21.696 UTC [1088] WARNING:  terminating connection because of crash of another server process
2022-02-21 20:04:21.696 UTC [1088] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-02-21 20:04:21.696 UTC [1088] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-02-21 20:04:21.696 UTC [1070] WARNING:  terminating connection because of crash of another server process
2022-02-21 20:04:21.696 UTC [1070] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-02-21 20:04:21.696 UTC [1070] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-02-21 20:04:21.697 UTC [1034] WARNING:  terminating connection because of crash of another server process
2022-02-21 20:04:21.697 UTC [1034] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-02-21 20:04:21.697 UTC [1034] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-02-21 20:04:21.696 UTC [1054] WARNING:  terminating connection because of crash of another server process
2022-02-21 20:04:21.696 UTC [1054] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-02-21 20:04:21.696 UTC [1054] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-02-21 20:04:21.709 UTC [1040] WARNING:  terminating connection because of crash of another server process
2022-02-21 20:04:21.709 UTC [1040] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-02-21 20:04:21.709 UTC [1040] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-02-21 20:04:21.709 UTC [1084] WARNING:  terminating connection because of crash of another server process
2022-02-21 20:04:21.709 UTC [1084] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-02-21 20:04:21.709 UTC [1084] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-02-21 20:04:21.709 UTC [1093] WARNING:  terminating connection because of crash of another server process
2022-02-21 20:04:21.709 UTC [1093] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-02-21 20:04:21.709 UTC [1093] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-02-21 20:04:21.710 UTC [1031] WARNING:  terminating connection because of crash of another server process
2022-02-21 20:04:21.710 UTC [1031] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-02-21 20:04:21.710 UTC [1031] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-02-21 20:04:21.710 UTC [1080] WARNING:  terminating connection because of crash of another server process
2022-02-21 20:04:21.710 UTC [1080] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-02-21 20:04:21.710 UTC [1080] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-02-21 20:04:21.711 UTC [1025] WARNING:  terminating connection because of crash of another server process
2022-02-21 20:04:21.711 UTC [1025] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-02-21 20:04:21.711 UTC [1025] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-02-21 20:04:21.711 UTC [1048] WARNING:  terminating connection because of crash of another server process
2022-02-21 20:04:21.711 UTC [1048] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-02-21 20:04:21.711 UTC [1048] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-02-21 20:04:21.716 UTC [1037] WARNING:  terminating connection because of crash of another server process
2022-02-21 20:04:21.716 UTC [1037] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-02-21 20:04:21.716 UTC [1037] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-02-21 20:04:21.719 UTC [1045] WARNING:  terminating connection because of crash of another server process
2022-02-21 20:04:21.719 UTC [1045] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-02-21 20:04:21.719 UTC [1045] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-02-21 20:04:21.722 UTC [1026] WARNING:  terminating connection because of crash of another server process
2022-02-21 20:04:21.722 UTC [1026] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-02-21 20:04:21.722 UTC [1026] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-02-21 20:04:21.726 UTC [1024] WARNING:  terminating connection because of crash of another server process
2022-02-21 20:04:21.726 UTC [1024] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-02-21 20:04:21.726 UTC [1024] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-02-21 20:04:21.726 UTC [1047] WARNING:  terminating connection because of crash of another server process
2022-02-21 20:04:21.726 UTC [1047] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-02-21 20:04:21.726 UTC [1047] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-02-21 20:04:21.736 UTC [1035] WARNING:  terminating connection because of crash of another server process
2022-02-21 20:04:21.736 UTC [1035] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-02-21 20:04:21.736 UTC [1035] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-02-21 20:04:21.736 UTC [1094] WARNING:  terminating connection because of crash of another server process
2022-02-21 20:04:21.736 UTC [1094] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-02-21 20:04:21.736 UTC [1094] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-02-21 20:04:21.736 UTC [1023] WARNING:  terminating connection because of crash of another server process
2022-02-21 20:04:21.736 UTC [1023] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-02-21 20:04:21.736 UTC [1023] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-02-21 20:04:21.752 UTC [1068] WARNING:  terminating connection because of crash of another server process
2022-02-21 20:04:21.752 UTC [1068] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-02-21 20:04:21.752 UTC [1068] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-02-21 20:04:21.755 UTC [1008] WARNING:  terminating connection because of crash of another server process
2022-02-21 20:04:21.755 UTC [1008] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-02-21 20:04:21.755 UTC [1008] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
postgres: docker resin 10.244.0.182(45418) BIND: malloc.c:2385: sysmalloc: Assertion `(old_top == initial_top (av) && old_size == 0) || ((unsigned long) (old_size) >= MINSIZE && prev_inuse (old_top) && ((unsigned long) old_end & (pagesize - 1)) == 0)' failed.
2022-02-21 20:04:21.742 UTC [1063] WARNING:  terminating connection because of crash of another server process
2022-02-21 20:04:21.742 UTC [1063] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-02-21 20:04:21.742 UTC [1063] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-02-21 20:04:21.754 UTC [1033] WARNING:  terminating connection because of crash of another server process
2022-02-21 20:04:21.754 UTC [1033] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-02-21 20:04:21.754 UTC [1033] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-02-21 20:04:21.760 UTC [1076] WARNING:  terminating connection because of crash of another server process
2022-02-21 20:04:21.760 UTC [1076] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-02-21 20:04:21.760 UTC [1076] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-02-21 20:04:21.762 UTC [1089] WARNING:  terminating connection because of crash of another server process
2022-02-21 20:04:21.762 UTC [1089] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-02-21 20:04:21.762 UTC [1089] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-02-21 20:04:21.909 UTC [1018] WARNING:  terminating connection because of crash of another server process
2022-02-21 20:04:21.909 UTC [1018] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-02-21 20:04:21.909 UTC [1018] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-02-21 20:04:21.913 UTC [1003] WARNING:  terminating connection because of crash of another server process
2022-02-21 20:04:21.913 UTC [1003] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-02-21 20:04:21.913 UTC [1003] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-02-21 20:04:21.921 UTC [1013] WARNING:  terminating connection because of crash of another server process
2022-02-21 20:04:21.921 UTC [1013] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-02-21 20:04:21.921 UTC [1013] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-02-21 20:04:21.922 UTC [1010] WARNING:  terminating connection because of crash of another server process
2022-02-21 20:04:21.922 UTC [1010] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-02-21 20:04:21.922 UTC [1010] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-02-21 20:04:21.926 UTC [1060] WARNING:  terminating connection because of crash of another server process
2022-02-21 20:04:21.926 UTC [1060] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-02-21 20:04:21.926 UTC [1060] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-02-21 20:04:21.935 UTC [1006] WARNING:  terminating connection because of crash of another server process
2022-02-21 20:04:21.935 UTC [1006] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-02-21 20:04:21.935 UTC [1006] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-02-21 20:04:21.947 UTC [1028] WARNING:  terminating connection because of crash of another server process
2022-02-21 20:04:21.947 UTC [1028] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-02-21 20:04:21.947 UTC [1028] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-02-21 20:04:21.947 UTC [993] WARNING:  terminating connection because of crash of another server process
2022-02-21 20:04:21.947 UTC [993] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-02-21 20:04:21.947 UTC [993] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-02-21 20:04:21.956 UTC [1032] WARNING:  terminating connection because of crash of another server process
2022-02-21 20:04:21.956 UTC [1032] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-02-21 20:04:21.956 UTC [1032] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-02-21 20:04:21.963 UTC [1004] WARNING:  terminating connection because of crash of another server process
2022-02-21 20:04:21.963 UTC [1004] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-02-21 20:04:21.963 UTC [1004] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-02-21 20:04:21.969 UTC [953] WARNING:  terminating connection because of crash of another server process
2022-02-21 20:04:21.969 UTC [953] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-02-21 20:04:21.969 UTC [953] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-02-21 20:04:21.969 UTC [1014] WARNING:  terminating connection because of crash of another server process
2022-02-21 20:04:21.969 UTC [1014] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-02-21 20:04:21.969 UTC [1014] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-02-21 20:04:21.973 UTC [1012] WARNING:  terminating connection because of crash of another server process
2022-02-21 20:04:21.973 UTC [1012] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-02-21 20:04:21.973 UTC [1012] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-02-21 20:04:21.974 UTC [1029] WARNING:  terminating connection because of crash of another server process
2022-02-21 20:04:21.974 UTC [1029] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-02-21 20:04:21.974 UTC [1029] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-02-21 20:04:21.982 UTC [1011] WARNING:  terminating connection because of crash of another server process
2022-02-21 20:04:21.982 UTC [1011] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-02-21 20:04:21.982 UTC [1011] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-02-21 20:04:21.993 UTC [999] WARNING:  terminating connection because of crash of another server process
2022-02-21 20:04:21.993 UTC [999] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-02-21 20:04:21.993 UTC [999] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-02-21 20:04:21.993 UTC [1015] WARNING:  terminating connection because of crash of another server process
2022-02-21 20:04:21.993 UTC [1015] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-02-21 20:04:21.993 UTC [1015] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-02-21 20:04:22.853 UTC [1] LOG:  all server processes terminated; reinitializing
2022-02-21 20:04:23.006 UTC [1111] LOG:  database system was interrupted; last known up at 2022-02-21 20:03:45 UTC
2022-02-21 20:04:23.156 UTC [1111] LOG:  database system was not properly shut down; automatic recovery in progress
2022-02-21 20:04:23.162 UTC [1111] LOG:  redo starts at C/F87D6930
2022-02-21 20:04:23.180 UTC [1111] LOG:  invalid record length at C/F88F9E40: wanted 24, got 0
2022-02-21 20:04:23.181 UTC [1111] LOG:  redo done at C/F88F9E08
2022-02-21 20:04:23.923 UTC [1] LOG:  database system is ready to accept connections
2022-02-21 20:04:28.407 UTC [1118] ERROR:  relation "uniq_model_model_type_vocab" already exists
2022-02-21 20:04:28.407 UTC [1118] STATEMENT:  CREATE UNIQUE INDEX "uniq_model_model_type_vocab" ON "model" ("is of-vocabulary", "model type");
2022-02-21 20:04:28.635 UTC [1119] ERROR:  relation "uniq_model_model_type_vocab" already exists
2022-02-21 20:04:28.635 UTC [1119] STATEMENT:  CREATE UNIQUE INDEX "uniq_model_model_type_vocab" ON "model" ("is of-vocabulary", "model type");
2022-02-21 20:07:54.987 UTC [1343] FATAL:  sorry, too many clients already
2022-02-21 20:07:54.994 UTC [1344] FATAL:  sorry, too many clients already
2022-02-21 20:07:55.009 UTC [1345] FATAL:  sorry, too many clients already
2022-02-21 20:07:55.011 UTC [1353] FATAL:  sorry, too many clients already
2022-02-21 20:07:55.017 UTC [1354] FATAL:  sorry, too many clients already
2022-02-21 20:07:55.018 UTC [1355] FATAL:  sorry, too many clients already
2022-02-21 20:07:55.020 UTC [1356] FATAL:  sorry, too many clients already
2022-02-21 20:07:55.041 UTC [1346] FATAL:  sorry, too many clients already
2022-02-21 20:07:55.052 UTC [1350] FATAL:  sorry, too many clients already

Maybe that someone can point out if there are Postgres configurations that have to be changed to take full advantage of the hardware. I’ve seen some posts about shared_buffer on the internet, but I’m not that familiair with Postgres.

Or maybe someone of the Balena team can confirm that updating the stack to the newest version (like in PR #137) will fix these problems?

Thanks in advance!

Hello @bversluijs,

thanks a lot for contributing the PR Dependency upgrades by bartversluijs · Pull Request #137 · balena-io/open-balena · GitHub .
Now as it’s merged, have you ever since faced the CPU load issues?

Best Regards
Harald

Hi @fisehara,

Thanks for reminding me of this topic.
It seems like that dramatically improved the CPU usage so far. Weird thing with those spikes were they randomly occurred after a few years of usage. So I can’t say for sure it fixes it, but I haven’t seen it since and I’ve been using the new versions for some months now.