Upgrade to 3.x: Unstability and what does it mean for older devices

Hi,
We did a upgrade to the latest openbalena (3.x) in a bit of a rush (the IT did not backup the volumes so probably no way to revert back to 2.x)

We have 2 ongoing issues…

  1. We see unstability of the system (API crashes)
    Here is the logs from the HA
    [WARNING] 319/132651 (17) : Server backend_vpn/balena_vpn_1 is UP, reason: Layer4 check passed, check duration: 0ms. 1 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
    [WARNING] 319/132653 (17) : Server backend_registry/balena_registry_1 is UP, reason: Layer4 check passed, check duration: 0ms. 1 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
    [WARNING] 319/132654 (17) : Server backend_s3/balena_s3_1 is UP, reason: Layer4 check passed, check duration: 0ms. 1 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
    [WARNING] 319/132721 (17) : Server backend_api/balena_api_1 is UP, reason: Layer4 check passed, check duration: 0ms. 1 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
    [WARNING] 319/132738 (17) : Server vpn-tunnel/balena_vpn is UP, reason: Layer4 check passed, check duration: 0ms. 1 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
    [WARNING] 320/015000 (17) : Server backend_api/balena_api_1 is DOWN, reason: Layer4 connection problem, info: “Connection refused”, check duration: 0ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
    [ALERT] 320/015000 (17) : backend ‘backend_api’ has no server available!

We now get error 503 - after everything was runing over the night…
We need to debug this very urgently…

  1. We have to build new devices with newer balenaos - like this:
    http://prntscr.com/vjz2kg

You can see the new devices… but what are options about the old devices (one does not report at all) and other which are deployed and runing… what are our options?
Also, is there a way to deploy SSH key to these devices in orer to have ssh access ?

Thanks for all you great support…

Make sure you update to the latest openBalena version – it is v3.1.1 now – which fixes a couple of initial issues. I’d then check the API service logs by SSH’ing into the container and querying journald.

You’ll also want to take a look at this discussion over here: Upgrading from v2.x.x to v3.x.x

Hi,
The update was done yesterday… i belive that makes it 3.1 :slight_smile:

On the other side, today i fixed the issue with only reseting the docker:
docker restart 2f5d63b9a213

The logs on the api instance are quite empty …
Also, can you provie info on the update of existing nodes - expecialy if we need to allow SSH access and the nodes are not available on site so how do we insert the SSH keys? any cool balena-cli command for this?

Thanks,

Hey, by default in production you won’t get any debug logs, see https://github.com/balena-io/open-balena-api/blob/1cf2fe6ddd8e651ba082f639b12cc38c169aa0e4/config/confd/templates/env.tmpl#L15. How did you access your devices before, don’t they already have SSH keys on them? You can find more info here: HowTo: SSH into host device

We never setup keys for SSH… a bit of mistake on our side…
Any change to do this remotely ?

Also, how do we debug this for the API if it hapens again?

P.S:
Stevce Radevski seems very Macedonian to me :slight_smile:

If you flashed a production image then you’re out of luck unfortunately and you’ll have to get physical access to the devices.

There should be some logs to debug issues like the API failing to start up, you mean there wasn’t anything like that in the logs? Cause even in production mode, the API should still log errors, etc, but much less than in development mode. Normally that should be enough, which is why we don’t use a DEBUG variable from the compose file.

P.S., it is Macedonian :smiley:

Yes, production images… so we need to fix this for all other new devices…

The logs on the API docker are empty… and docker AI died at least once yeasterdey we will see today…

P.S:
може скопски во кафана :slight_smile:

The logs should not be empty for sure, you can try restarting the services and see if that fixes the issue. If it still persist, you can open an issue with some more info here or here, and we can take it from there.

Here is what i see curently (after yesterdays restart) for the API docker…

You should log into the container and then execute journalctl.
Or as one command, execute this:

docker exec -it openbalena_api_1 "journalctl -u open-balena-api.service -xe"

You should see some more logs there.

That gives an error…
Standard bash and journalctl show logs starting tonight - that means probably info was lost…

We can wait to see what is the problem next time it fails…

Thanks for the help so far…

Please let us know if you observe that behavior again so we can take a closer look.