API container crashing

We are experiencing an error where every few days the openbalena_api_1 container is crashing, which results in the following message when trying to access it via balena-cli:

BalenaRequestError: Request error: <html><body><h1>503 Service Unavailable</h1>
No server is available to handle this request.
</body></html>


Additional information may be available with the `--debug` flag.

For further help or support, visit:
https://www.balena.io/docs/reference/balena-cli/#support-faq-and-troubleshooting

Below are the logs we are seeing in the openbalena_api_1 container when running journalctl -u open-balena-api -fn100, any ideas on what could be causing this?

Mar 12 18:22:01 7a5fd06078fe api[726]: ORDER BY "permission"."name" ASC [ 'C2rvt4O3UXmZjComhG48RXOpiUovJLHs' ]
Mar 12 18:22:01 7a5fd06078fe api[726]: Parsing GET /Auth/api_key(key=@apiKey)?$select=is_of__actor&@apiKey='C2rvt4O3UXmZjComhG48RXOpiUovJLHs'
Mar 12 18:22:01 7a5fd06078fe api[726]: Running GET /Auth/api_key(key=@apiKey)?$select=is_of__actor&@apiKey='C2rvt4O3UXmZjComhG48RXOpiUovJLHs'
Mar 12 18:22:01 7a5fd06078fe api[726]: SELECT "api key"."is of-actor" AS "is_of__actor"
Mar 12 18:22:01 7a5fd06078fe api[726]: FROM "api key"
Mar 12 18:22:01 7a5fd06078fe api[726]: WHERE ("api key"."key") IS NOT NULL AND ("api key"."key") = ($1) [ 'C2rvt4O3UXmZjComhG48RXOpiUovJLHs' ]
Mar 12 18:22:01 7a5fd06078fe api[726]: Running PATCH /resin/service_instance(3)
Mar 12 18:22:01 7a5fd06078fe api[726]: UPDATE "service instance"
Mar 12 18:22:01 7a5fd06078fe api[726]: SET "last heartbeat" = $1
Mar 12 18:22:01 7a5fd06078fe api[726]: WHERE ("service instance"."id") IS NOT NULL AND ("service instance"."id") = ($2)
Mar 12 18:22:01 7a5fd06078fe api[726]: AND "service instance"."id" IN ((
Mar 12 18:22:01 7a5fd06078fe api[726]:         SELECT "service instance"."id"
Mar 12 18:22:01 7a5fd06078fe api[726]:         FROM (
Mar 12 18:22:01 7a5fd06078fe api[726]:                 SELECT "service instance"."created at", "service instance"."modified at", "service instance"."id", "service instance"."service type", "service instance"."ip address", "service instance"."last heartbeat"
Mar 12 18:22:01 7a5fd06078fe api[726]:                 FROM "service instance"
Mar 12 18:22:01 7a5fd06078fe api[726]:         ) AS "service instance"
Mar 12 18:22:01 7a5fd06078fe api[726]: )) [ 2021-03-12T18:22:01.934Z, 3 ]
Mar 12 18:22:01 7a5fd06078fe api[726]: 2021-03-12T18:22:01.944Z 172.20.0.8 s/vpn PATCH /resin/service_instance(3) 200 24.843ms -
Mar 12 18:22:04 7a5fd06078fe api[726]: /usr/src/app/src/features/contracts/contracts-directory.ts:123
Mar 12 18:22:04 7a5fd06078fe api[726]:                 new Error(
Mar 12 18:22:04 7a5fd06078fe api[726]:   ^
Mar 12 18:22:04 7a5fd06078fe api[726]: Error: Invalid response while fetching contracts: Internal Server Error
Mar 12 18:22:04 7a5fd06078fe api[726]:     at Request.handleResponse [as _callback] (/usr/src/app/src/features/contracts/contracts-directory.ts:123:3)
Mar 12 18:22:04 7a5fd06078fe api[726]:     at Request.self.callback (/usr/src/app/node_modules/request/request.js:185:22)
Mar 12 18:22:04 7a5fd06078fe api[726]:     at Request.emit (events.js:327:22)
Mar 12 18:22:04 7a5fd06078fe api[726]:     at Request.EventEmitter.emit (domain.js:467:12)
Mar 12 18:22:04 7a5fd06078fe api[726]:     at Request.<anonymous> (/usr/src/app/node_modules/request/request.js:1154:10)
Mar 12 18:22:04 7a5fd06078fe api[726]:     at Request.emit (events.js:327:22)
Mar 12 18:22:04 7a5fd06078fe api[726]:     at Request.EventEmitter.emit (domain.js:467:12)
Mar 12 18:22:04 7a5fd06078fe api[726]:     at IncomingMessage.<anonymous> (/usr/src/app/node_modules/request/request.js:1076:12)
Mar 12 18:22:04 7a5fd06078fe api[726]:     at Object.onceWrapper (events.js:421:28)
Mar 12 18:22:04 7a5fd06078fe api[726]:     at IncomingMessage.emit (events.js:327:22)
Mar 12 18:22:04 7a5fd06078fe api[726]:     at IncomingMessage.EventEmitter.emit (domain.js:467:12)
Mar 12 18:22:04 7a5fd06078fe api[726]:     at endReadableNT (internal/streams/readable.js:1327:12)
Mar 12 18:22:04 7a5fd06078fe api[726]:     at processTicksAndRejections (internal/process/task_queues.js:80:21)
Mar 12 18:22:04 7a5fd06078fe api[726]: Program node index.js exited with code 1

We are setting PRODUCTION_MODE: "true" in the meanwhile to keep things running.

Hi,

I’m having this problem too. Which version of open-balena are you running? I’ve created an issue here, but v3.2.0 will probably fix this issue. I haven’t had the time yet to update my open-balena cluster.


Added question, what exactly does PRODUCTION_MODE do?

@drcnyc, what version of OpenBalena are you running? It looks like maybe what @bversluijs mentions is the issue (Crash on fetching contracts · Issue #573 · balena-io/open-balena-api · GitHub), which should be resolved by an update.

@bversluijs PRODUCTION_MODE will cause the logs to have less information but will cause a crashed container to restart.

1 Like

Hi Alan,

So production mode will restart the process if this (or any other) errors occur?

Thanks!

@bversluijs Yes as long as the process the container is running exits, it will restart it. Let us know if this is not the behavior you observe!

I think the problem with this particular bug is that it doesn’t actually crash the container, instead the container keeps running but not responding - so production mode doesn’t fix it. It looks like the bug was fixed in a more recent commit so we are just waiting for that to make it’s way into the master openbalena build.

Thanks for getting back. Can you point us to the commit which seems to fix the underlying issue of non-responding container?

If not other things, we can check on it’s feasibility to get it out to OpenBalena.

Here is the issue:

And here is the patch:

Isn’t the default restart policy ‘none’? So enabling the production mode and the container exiting would just leave a container down? Would need at minimum a restart-on-fail.

The problem is that this particular bug doesn’t crash the container, it just renders it unusable - so without the fix it needs to be manually restarted.

One feature suggestion might be restart-on-service-down which actually checks that the service responds to ping requests and restarts it if not.

Hey Maggie, I think PRODUCTION_MODE will take precedence over that, but to be honest Im not really sure if it will behave as always or unless-stopped

Hello David, what version of open-balena are you running right now? The commit you linked was merged into 0.115.1 (open-balena-api/CHANGELOG.md at master · balena-io/open-balena-api · GitHub) which should be available in the latest open-balena open-balena/versions at master · balena-io/open-balena · GitHub