API container crashing

We are experiencing an error where every few days the openbalena_api_1 container is crashing, which results in the following message when trying to access it via balena-cli:

BalenaRequestError: Request error: <html><body><h1>503 Service Unavailable</h1>
No server is available to handle this request.
</body></html>


Additional information may be available with the `--debug` flag.

For further help or support, visit:
https://www.balena.io/docs/reference/balena-cli/#support-faq-and-troubleshooting

Below are the logs we are seeing in the openbalena_api_1 container when running journalctl -u open-balena-api -fn100, any ideas on what could be causing this?

Mar 12 18:22:01 7a5fd06078fe api[726]: ORDER BY "permission"."name" ASC [ 'C2rvt4O3UXmZjComhG48RXOpiUovJLHs' ]
Mar 12 18:22:01 7a5fd06078fe api[726]: Parsing GET /Auth/api_key(key=@apiKey)?$select=is_of__actor&@apiKey='C2rvt4O3UXmZjComhG48RXOpiUovJLHs'
Mar 12 18:22:01 7a5fd06078fe api[726]: Running GET /Auth/api_key(key=@apiKey)?$select=is_of__actor&@apiKey='C2rvt4O3UXmZjComhG48RXOpiUovJLHs'
Mar 12 18:22:01 7a5fd06078fe api[726]: SELECT "api key"."is of-actor" AS "is_of__actor"
Mar 12 18:22:01 7a5fd06078fe api[726]: FROM "api key"
Mar 12 18:22:01 7a5fd06078fe api[726]: WHERE ("api key"."key") IS NOT NULL AND ("api key"."key") = ($1) [ 'C2rvt4O3UXmZjComhG48RXOpiUovJLHs' ]
Mar 12 18:22:01 7a5fd06078fe api[726]: Running PATCH /resin/service_instance(3)
Mar 12 18:22:01 7a5fd06078fe api[726]: UPDATE "service instance"
Mar 12 18:22:01 7a5fd06078fe api[726]: SET "last heartbeat" = $1
Mar 12 18:22:01 7a5fd06078fe api[726]: WHERE ("service instance"."id") IS NOT NULL AND ("service instance"."id") = ($2)
Mar 12 18:22:01 7a5fd06078fe api[726]: AND "service instance"."id" IN ((
Mar 12 18:22:01 7a5fd06078fe api[726]:         SELECT "service instance"."id"
Mar 12 18:22:01 7a5fd06078fe api[726]:         FROM (
Mar 12 18:22:01 7a5fd06078fe api[726]:                 SELECT "service instance"."created at", "service instance"."modified at", "service instance"."id", "service instance"."service type", "service instance"."ip address", "service instance"."last heartbeat"
Mar 12 18:22:01 7a5fd06078fe api[726]:                 FROM "service instance"
Mar 12 18:22:01 7a5fd06078fe api[726]:         ) AS "service instance"
Mar 12 18:22:01 7a5fd06078fe api[726]: )) [ 2021-03-12T18:22:01.934Z, 3 ]
Mar 12 18:22:01 7a5fd06078fe api[726]: 2021-03-12T18:22:01.944Z 172.20.0.8 s/vpn PATCH /resin/service_instance(3) 200 24.843ms -
Mar 12 18:22:04 7a5fd06078fe api[726]: /usr/src/app/src/features/contracts/contracts-directory.ts:123
Mar 12 18:22:04 7a5fd06078fe api[726]:                 new Error(
Mar 12 18:22:04 7a5fd06078fe api[726]:   ^
Mar 12 18:22:04 7a5fd06078fe api[726]: Error: Invalid response while fetching contracts: Internal Server Error
Mar 12 18:22:04 7a5fd06078fe api[726]:     at Request.handleResponse [as _callback] (/usr/src/app/src/features/contracts/contracts-directory.ts:123:3)
Mar 12 18:22:04 7a5fd06078fe api[726]:     at Request.self.callback (/usr/src/app/node_modules/request/request.js:185:22)
Mar 12 18:22:04 7a5fd06078fe api[726]:     at Request.emit (events.js:327:22)
Mar 12 18:22:04 7a5fd06078fe api[726]:     at Request.EventEmitter.emit (domain.js:467:12)
Mar 12 18:22:04 7a5fd06078fe api[726]:     at Request.<anonymous> (/usr/src/app/node_modules/request/request.js:1154:10)
Mar 12 18:22:04 7a5fd06078fe api[726]:     at Request.emit (events.js:327:22)
Mar 12 18:22:04 7a5fd06078fe api[726]:     at Request.EventEmitter.emit (domain.js:467:12)
Mar 12 18:22:04 7a5fd06078fe api[726]:     at IncomingMessage.<anonymous> (/usr/src/app/node_modules/request/request.js:1076:12)
Mar 12 18:22:04 7a5fd06078fe api[726]:     at Object.onceWrapper (events.js:421:28)
Mar 12 18:22:04 7a5fd06078fe api[726]:     at IncomingMessage.emit (events.js:327:22)
Mar 12 18:22:04 7a5fd06078fe api[726]:     at IncomingMessage.EventEmitter.emit (domain.js:467:12)
Mar 12 18:22:04 7a5fd06078fe api[726]:     at endReadableNT (internal/streams/readable.js:1327:12)
Mar 12 18:22:04 7a5fd06078fe api[726]:     at processTicksAndRejections (internal/process/task_queues.js:80:21)
Mar 12 18:22:04 7a5fd06078fe api[726]: Program node index.js exited with code 1

We are setting PRODUCTION_MODE: "true" in the meanwhile to keep things running.

Hi,

Iā€™m having this problem too. Which version of open-balena are you running? Iā€™ve created an issue here, but v3.2.0 will probably fix this issue. I havenā€™t had the time yet to update my open-balena cluster.


Added question, what exactly does PRODUCTION_MODE do?

@drcnyc, what version of OpenBalena are you running? It looks like maybe what @bversluijs mentions is the issue (Crash on fetching contracts Ā· Issue #573 Ā· balena-io/open-balena-api Ā· GitHub), which should be resolved by an update.

@bversluijs PRODUCTION_MODE will cause the logs to have less information but will cause a crashed container to restart.

1 Like

Hi Alan,

So production mode will restart the process if this (or any other) errors occur?

Thanks!

@bversluijs Yes as long as the process the container is running exits, it will restart it. Let us know if this is not the behavior you observe!

I think the problem with this particular bug is that it doesnā€™t actually crash the container, instead the container keeps running but not responding - so production mode doesnā€™t fix it. It looks like the bug was fixed in a more recent commit so we are just waiting for that to make itā€™s way into the master openbalena build.

Thanks for getting back. Can you point us to the commit which seems to fix the underlying issue of non-responding container?

If not other things, we can check on itā€™s feasibility to get it out to OpenBalena.

Here is the issue:

And here is the patch:

Isnā€™t the default restart policy ā€˜noneā€™? So enabling the production mode and the container exiting would just leave a container down? Would need at minimum a restart-on-fail.

The problem is that this particular bug doesnā€™t crash the container, it just renders it unusable - so without the fix it needs to be manually restarted.

One feature suggestion might be restart-on-service-down which actually checks that the service responds to ping requests and restarts it if not.

Hey Maggie, I think PRODUCTION_MODE will take precedence over that, but to be honest Im not really sure if it will behave as always or unless-stopped

Hello David, what version of open-balena are you running right now? The commit you linked was merged into 0.115.1 (open-balena-api/CHANGELOG.md at master Ā· balena-io/open-balena-api Ā· GitHub) which should be available in the latest open-balena open-balena/versions at master Ā· balena-io/open-balena Ā· GitHub