Updating OpenBalena certificates failed

Hi,
I have to update an OpenBalena server that is currently running versione 3.4.1 and has expired certificates.
First of all I tried to update certificates using easyrsa as described in a post in this forum but failed (both VPN and ROOT were updated).

Since it’s not a production server, I also tried to use quickstart again to reconfigure the server.
At first I used quichstart -c and I got the ACME certificates installed, but I want to have the self-signed certificates, so I made a backup of the config folder and run quickstart again without the -c option.
The server is up and running but I cannot get rid of the ACME certificates.
Testing the sites with /ping I always get the 2 months ACME certificates.

I also tried ./compose up --force-recreate --no-deps

How can I make a clean install to have the self signed certificates?
I first want to make it running with the correct certificated and only after that I will upgrade to latest versione of OpenBalena.

Thank you.

Somehow I got it working with

docker system prune -a 
docker volume prune

So I got the system up and running at the previous version (3.4.1) with working certificates.

I registered a device and I got it right, when I run balena devices I can see it.
So far so good…
Then I tried to upgrade to the latest version.

I followed this procedure:

./scripts/compose down
git pull
./scripts/compose build
./scripts/compose up -d

Now the server is running, but balena CLI refuses to connect saying:
BalenaRequestError: Request error:

503 Service Unavailable


I receive the same error by testing using https://domain/ping

Looking into docker logs on OpenBalena server I did not see anything unusual.
Any hint about what can I check?
Thank you

EDIT:
Inspecting ha_proxy container I noticed that not all services go UP, even if all containers are up and running.
Inspecting logs of those containers isn’t helpful since I only get “Systemd init system enabled.”
After restarting containers several times I get something to go UP but never all the services all together and never at first try.

EDIT 2:
Sorry for the edit… I got some more insight about the error. I’m struck here, I hope someone can point me to the right direction.
Calling:
./scripts/compose exec api journalctl -fn100

I finally got to see the errors on api service:

May 22 14:23:39 58fac93ff3cc configure-balena.sh[63]: Installing custom CA bundle...
May 22 14:23:40 58fac93ff3cc configure-balena.sh[489]: Updating certificates in /etc/ssl/certs...
May 22 14:23:42 58fac93ff3cc configure-balena.sh[489]: 0 added, 0 removed; done.
May 22 14:23:42 58fac93ff3cc configure-balena.sh[489]: Running hooks in /etc/ca-certificates/update.d...
May 22 14:23:42 58fac93ff3cc configure-balena.sh[489]: done.
May 22 14:23:42 58fac93ff3cc configure-balena.sh[1214]: 2024-05-22T14:23:42Z 58fac93ff3cc /usr/local/bin/confd[1214]: INFO Backend set to env
May 22 14:23:42 58fac93ff3cc configure-balena.sh[1214]: 2024-05-22T14:23:42Z 58fac93ff3cc /usr/local/bin/confd[1214]: INFO Starting confd
May 22 14:23:42 58fac93ff3cc configure-balena.sh[1214]: 2024-05-22T14:23:42Z 58fac93ff3cc /usr/local/bin/confd[1214]: INFO Backend source(s) set to
May 22 14:23:42 58fac93ff3cc systemd[1]: Reloading.
May 22 14:23:42 58fac93ff3cc systemd[1]: /etc/systemd/system/open-balena-api.service:14: Unknown key name 'StartLimitIntervalSec' in section 'Service', ignoring.
May 22 14:23:42 58fac93ff3cc systemd[1]: confd.service: Deactivated successfully.
May 22 14:23:42 58fac93ff3cc systemd[1]: Finished confd.service - Confd.
May 22 14:23:42 58fac93ff3cc systemd[1]: Started open-balena-api.service - open-balena-api.
May 22 14:23:42 58fac93ff3cc systemd[1]: Starting rsyslog.service - System Logging Service...
May 22 14:23:42 58fac93ff3cc systemd[1]: rsyslog.service: Main process exited, code=exited, status=1/FAILURE
May 22 14:23:42 58fac93ff3cc systemd[1]: rsyslog.service: Failed with result 'exit-code'.
May 22 14:23:42 58fac93ff3cc systemd[1]: Failed to start rsyslog.service - System Logging Service.
May 22 14:23:42 58fac93ff3cc systemd[1]: Reached target multi-user.target - Multi-User System.
May 22 14:23:42 58fac93ff3cc systemd[1]: Starting systemd-update-utmp-runlevel.service - Record Runlevel Change in UTMP...
May 22 14:23:42 58fac93ff3cc systemd[1]: systemd-update-utmp-runlevel.service: Deactivated successfully.
May 22 14:23:42 58fac93ff3cc systemd[1]: Finished systemd-update-utmp-runlevel.service - Record Runlevel Change in UTMP.
May 22 14:23:42 58fac93ff3cc systemd[1]: Startup finished in 8.245s.
May 22 14:23:43 58fac93ff3cc systemd[1]: rsyslog.service: Scheduled restart job, restart counter is at 1.
May 22 14:23:43 58fac93ff3cc systemd[1]: Stopped rsyslog.service - System Logging Service.
May 22 14:23:43 58fac93ff3cc systemd[1]: Starting rsyslog.service - System Logging Service...
May 22 14:23:43 58fac93ff3cc systemd[1]: rsyslog.service: Main process exited, code=exited, status=1/FAILURE
May 22 14:23:43 58fac93ff3cc systemd[1]: rsyslog.service: Failed with result 'exit-code'.
May 22 14:23:43 58fac93ff3cc systemd[1]: Failed to start rsyslog.service - System Logging Service.
May 22 14:23:43 58fac93ff3cc api[1237]: Running node-supervisor with
May 22 14:23:43 58fac93ff3cc api[1237]:   program 'index.js'
May 22 14:23:43 58fac93ff3cc api[1237]:   --watch 'src'
May 22 14:23:43 58fac93ff3cc api[1237]:   --extensions 'js,node,coffee,sbvr,json,sql,pegjs,ts'
May 22 14:23:43 58fac93ff3cc api[1237]:   --exec 'node'
May 22 14:23:43 58fac93ff3cc api[1237]: Starting child process with 'node index.js'
May 22 14:23:43 58fac93ff3cc api[1237]: Watching directory '/usr/src/app/src' for changes.
May 22 14:23:43 58fac93ff3cc api[1237]: Press rs for restarting the process.
May 22 14:23:43 58fac93ff3cc systemd[1]: rsyslog.service: Scheduled restart job, restart counter is at 2.
May 22 14:23:43 58fac93ff3cc systemd[1]: Stopped rsyslog.service - System Logging Service.
May 22 14:23:43 58fac93ff3cc systemd[1]: Starting rsyslog.service - System Logging Service...
May 22 14:23:43 58fac93ff3cc systemd[1]: rsyslog.service: Main process exited, code=exited, status=1/FAILURE
May 22 14:23:43 58fac93ff3cc systemd[1]: rsyslog.service: Failed with result 'exit-code'.
May 22 14:23:43 58fac93ff3cc systemd[1]: Failed to start rsyslog.service - System Logging Service.
May 22 14:23:43 58fac93ff3cc systemd[1]: rsyslog.service: Scheduled restart job, restart counter is at 3.
May 22 14:23:43 58fac93ff3cc systemd[1]: Stopped rsyslog.service - System Logging Service.
May 22 14:23:43 58fac93ff3cc systemd[1]: Starting rsyslog.service - System Logging Service...
May 22 14:23:43 58fac93ff3cc systemd[1]: rsyslog.service: Main process exited, code=exited, status=1/FAILURE
May 22 14:23:43 58fac93ff3cc systemd[1]: rsyslog.service: Failed with result 'exit-code'.
May 22 14:23:43 58fac93ff3cc systemd[1]: Failed to start rsyslog.service - System Logging Service.
May 22 14:23:44 58fac93ff3cc systemd[1]: rsyslog.service: Scheduled restart job, restart counter is at 4.
May 22 14:23:44 58fac93ff3cc systemd[1]: Stopped rsyslog.service - System Logging Service.
May 22 14:23:44 58fac93ff3cc systemd[1]: Starting rsyslog.service - System Logging Service...
May 22 14:23:44 58fac93ff3cc systemd[1]: rsyslog.service: Main process exited, code=exited, status=1/FAILURE
May 22 14:23:44 58fac93ff3cc systemd[1]: rsyslog.service: Failed with result 'exit-code'.
May 22 14:23:44 58fac93ff3cc systemd[1]: Failed to start rsyslog.service - System Logging Service.
May 22 14:23:44 58fac93ff3cc systemd[1]: rsyslog.service: Scheduled restart job, restart counter is at 5.
May 22 14:23:44 58fac93ff3cc systemd[1]: Stopped rsyslog.service - System Logging Service.
May 22 14:23:44 58fac93ff3cc systemd[1]: rsyslog.service: Start request repeated too quickly.
May 22 14:23:44 58fac93ff3cc systemd[1]: rsyslog.service: Failed with result 'exit-code'.
May 22 14:23:44 58fac93ff3cc systemd[1]: Failed to start rsyslog.service - System Logging Service.
May 22 14:23:44 58fac93ff3cc systemd[1]: syslog.socket: Failed with result 'service-start-limit-hit'.
May 22 14:23:45 58fac93ff3cc api[1247]: Error: Cannot find module '/usr/src/app/init.js' imported from /usr/src/app/index.js
May 22 14:23:45 58fac93ff3cc api[1247]: Did you mean to import "./init.ts"?
May 22 14:23:45 58fac93ff3cc api[1247]:     at finalizeResolution (node:internal/modules/esm/resolve:264:11)
May 22 14:23:45 58fac93ff3cc api[1247]:     at moduleResolve (node:internal/modules/esm/resolve:924:10)
May 22 14:23:45 58fac93ff3cc api[1247]:     at defaultResolve (node:internal/modules/esm/resolve:1148:11)
May 22 14:23:45 58fac93ff3cc api[1247]:     at ModuleLoader.defaultResolve (node:internal/modules/esm/loader:390:12)
May 22 14:23:45 58fac93ff3cc api[1247]:     at ModuleLoader.resolve (node:internal/modules/esm/loader:359:25)
May 22 14:23:45 58fac93ff3cc api[1247]:     at ModuleLoader.getModuleJob (node:internal/modules/esm/loader:234:38)
May 22 14:23:45 58fac93ff3cc api[1247]:     at ModuleLoader.import (node:internal/modules/esm/loader:322:34)
May 22 14:23:45 58fac93ff3cc api[1247]:     at importModuleDynamically (node:internal/modules/esm/translators:160:35)
May 22 14:23:45 58fac93ff3cc api[1247]:     at importModuleDynamicallyCallback (node:internal/modules/esm/utils:225:14)
May 22 14:23:45 58fac93ff3cc api[1247]:     at start (file:///usr/src/app/index.js:8:2) {
May 22 14:23:45 58fac93ff3cc api[1247]:   code: 'ERR_MODULE_NOT_FOUND',
May 22 14:23:45 58fac93ff3cc api[1247]:   url: 'file:///usr/src/app/init.js'
May 22 14:23:45 58fac93ff3cc api[1247]: }
May 22 14:23:45 58fac93ff3cc api[1237]: Program node index.js exited with code 1

Have you tried setting PRODUCTION_MODE to true for the API container?

Thank you for you suggestion!
I missed that article.
I’m trying right now, even if it’s not actually a Production server but I guess it’s not a problem here.

Having set the PRODUCTION_MODE flag to true, not I’m having in the db service a lot of logs involving DDL queries (Several ALTER COLUMN statements).
Some errors too like “ERROR: operator does not exist: boolean <> integer”
AND “ERROR: relation “uniq_model_model_type_vocab” already exists”

The queries are repeating for a while now… and api service is still not up…

Inspecting the api service again it seems it’s failing to perform migration of the database.
Maybe upgrading from version 3.4.1 was too old?

I found this topic here that talk about my problem.
I’ve inspected the db container and connected using psql but on release table I did not find the mentioned constrained, nor anything similar…

Ok… after another day of inspection I find out that database error is about an Index called “uniq_model_model_type_vocab”

The script somewhere tries to create it but fails because it already exists.
Using psql I manually dropped the index, but after a few seconds it’s been receated and the script continues to fail.
It seems it still happens during migration 82,

2024-05-23 13:31:44.696 UTC [239] ERROR:  relation "uniq_model_model_type_vocab" already exists
2024-05-23 13:31:44.696 UTC [239] STATEMENT:  CREATE UNIQUE INDEX "uniq_model_model_type_vocab" ON "model" ("is of-vocabulary", "model type");
2024-05-23 13:31:46.357 UTC [240] ERROR:  relation "uniq_model_model_type_vocab" already exists
2024-05-23 13:31:46.357 UTC [240] STATEMENT:  CREATE UNIQUE INDEX "uniq_model_model_type_vocab" ON "model" ("is of-vocabulary", "model type");
2024-05-23 13:31:47.964 UTC [239] ERROR:  operator does not exist: boolean <> integer
2024-05-23 13:31:47.964 UTC [239] HINT:  No operator matches the given name and argument types. You might need to add explicit type casts.
2024-05-23 13:31:47.964 UTC [239] STATEMENT:  -- Boolean type conversions

Any reason you’re upgrading to v3.8.5 specifically? Can you maybe try restoring a backup and upgrading to v3.5.0? That was the last error free version I found before v11.8.1 which introduces TRUST_PROXY: "true" to fix a redirect error.

Actually not a specific reason, I just wanted to update to the latest version.
On GitHub I just found version 3.8.3 and above (3 versions in total)

By the way, now the TRUST_PROXY isn’t what is bothering me rather than the failing migration during upgrade.
Where can I find the 3.5.0 version?

Looks like we’re talking about two different things. I was talking about API releases not the openbalena releases on Github. What’s your API version atm?

API version is currently at 0.139.0, before migrations.

That’s pretty old. As you can see the API is currently at v22.2.2 (Releases · balena-io/open-balena-api · GitHub). The main openbalena repo is in a bit of rough state at the moment. They are working on a big PR to get everything up to date. (see openBalena 2024 by ab77 · Pull Request #141 · balena-io/open-balena · GitHub)

You could try upgrading (one major version at a time), but that’s going to take a while and you might run into issues. If possible, I would just suggest setting up a new up to date server and provisioning your devices again.

That would be troublesome since I currently have more than 200 devices deployed and dispatched in remote areas…
Isn’t there a way to prepare a new server and migrate devices remotely?

I’ve personally upgraded several production servers all the way from v0.209.2 to v20.2.15 so that’s definitely possible, though it takes some time. A long time ago I also tried upgrading a v0.1xx.x server to v0.209.2 and that never worked out. I ended up migrating all devices to a new server, most of them were also running in remote locations so I know the struggle.

You can always try, but from my experience all db migrations before v1.0.0 were pretty hit & miss