Upgrade to 3.x: Unstability and what does it mean for older devices

bidikov · November 16, 2020, 10:55am

Hi,
We did a upgrade to the latest openbalena (3.x) in a bit of a rush (the IT did not backup the volumes so probably no way to revert back to 2.x)

We have 2 ongoing issues…

We see unstability of the system (API crashes)
Here is the logs from the HA
[WARNING] 319/132651 (17) : Server backend_vpn/balena_vpn_1 is UP, reason: Layer4 check passed, check duration: 0ms. 1 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
[WARNING] 319/132653 (17) : Server backend_registry/balena_registry_1 is UP, reason: Layer4 check passed, check duration: 0ms. 1 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
[WARNING] 319/132654 (17) : Server backend_s3/balena_s3_1 is UP, reason: Layer4 check passed, check duration: 0ms. 1 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
[WARNING] 319/132721 (17) : Server backend_api/balena_api_1 is UP, reason: Layer4 check passed, check duration: 0ms. 1 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
[WARNING] 319/132738 (17) : Server vpn-tunnel/balena_vpn is UP, reason: Layer4 check passed, check duration: 0ms. 1 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
[WARNING] 320/015000 (17) : Server backend_api/balena_api_1 is DOWN, reason: Layer4 connection problem, info: “Connection refused”, check duration: 0ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[ALERT] 320/015000 (17) : backend ‘backend_api’ has no server available!

We now get error 503 - after everything was runing over the night…
We need to debug this very urgently…

We have to build new devices with newer balenaos - like this:
http://prntscr.com/vjz2kg

You can see the new devices… but what are options about the old devices (one does not report at all) and other which are deployed and runing… what are our options?
Also, is there a way to deploy SSH key to these devices in orer to have ssh access ?

Thanks for all you great support…

dfunckt · November 16, 2020, 3:18pm

Make sure you update to the latest openBalena version – it is v3.1.1 now – which fixes a couple of initial issues. I’d then check the API service logs by SSH’ing into the container and querying journald.

You’ll also want to take a look at this discussion over here: Upgrading from v2.x.x to v3.x.x

bidikov · November 16, 2020, 7:13pm

Hi,
The update was done yesterday… i belive that makes it 3.1

On the other side, today i fixed the issue with only reseting the docker:
docker restart 2f5d63b9a213

The logs on the api instance are quite empty …
Also, can you provie info on the update of existing nodes - expecialy if we need to allow SSH access and the nodes are not available on site so how do we insert the SSH keys? any cool balena-cli command for this?

Thanks,

sradevski · November 17, 2020, 12:31pm

Hey, by default in production you won’t get any debug logs, see https://github.com/balena-io/open-balena-api/blob/1cf2fe6ddd8e651ba082f639b12cc38c169aa0e4/config/confd/templates/env.tmpl#L15. How did you access your devices before, don’t they already have SSH keys on them? You can find more info here: HowTo: SSH into host device

bidikov · November 17, 2020, 12:45pm

We never setup keys for SSH… a bit of mistake on our side…
Any change to do this remotely ?

Also, how do we debug this for the API if it hapens again?

P.S:
Stevce Radevski seems very Macedonian to me

sradevski · November 17, 2020, 12:57pm

If you flashed a production image then you’re out of luck unfortunately and you’ll have to get physical access to the devices.

There should be some logs to debug issues like the API failing to start up, you mean there wasn’t anything like that in the logs? Cause even in production mode, the API should still log errors, etc, but much less than in development mode. Normally that should be enough, which is why we don’t use a DEBUG variable from the compose file.

P.S., it is Macedonian

bidikov · November 17, 2020, 1:11pm

Yes, production images… so we need to fix this for all other new devices…

The logs on the API docker are empty… and docker AI died at least once yeasterdey we will see today…

P.S:
може скопски во кафана

sradevski · November 17, 2020, 1:23pm

The logs should not be empty for sure, you can try restarting the services and see if that fixes the issue. If it still persist, you can open an issue with some more info here or here, and we can take it from there.

bidikov · November 17, 2020, 5:29pm

Here is what i see curently (after yesterdays restart) for the API docker…

bversluijs · November 17, 2020, 5:56pm

You should log into the container and then execute journalctl.
Or as one command, execute this:

docker exec -it openbalena_api_1 "journalctl -u open-balena-api.service -xe"

You should see some more logs there.

bidikov · November 17, 2020, 7:51pm

That gives an error…
Standard bash and journalctl show logs starting tonight - that means probably info was lost…

We can wait to see what is the problem next time it fails…

Thanks for the help so far…

xginn8 · November 17, 2020, 10:42pm

Please let us know if you observe that behavior again so we can take a closer look.

Topic		Replies	Views
ECONNRESET on the API URL every few minutes openBalena pendinguserresponse	5	5037	May 7, 2019
Openbalena crash logs openBalena	13	2561	March 2, 2019
[Troubleshooting] HAProxy backend servers down openBalena	16	10802	February 3, 2020
OpenBalena crash: Error: getaddrinfo ENOTFOUND api.github.com openBalena	6	1973	April 17, 2021
Upgrading from v2.x.x to v3.x.x openBalena	26	2206	February 15, 2021

Upgrade to 3.x: Unstability and what does it mean for older devices

Related topics