Changing openBalena Domain

Hi All,

Wondering if anyone has had any experience changing the domain name of an openBalena server.

I think the process would be something like this:

  1. Stop openBalena containers.
  2. Generate certificates for new domain.
  3. Update config attributes:
  • OPENBALENA_ROOT_CA
  • OPENBALENA_ROOT_CRT
  • OPENBALENA_ROOT_KEY
  • OPENBALENA_VPN_CA
  • OPENBALENA_VPN_CA_CHAIN
  1. Update DNS to redirect old domains to new domains for:
    api.old-domain.com
    ssh.old-domain.com
    registry.old-domain.com
    s3.old-domain.com
    ssh.old-domain.com
    tunnel.old-domain.com
    vpn.old-domain.com
  2. Restart containers.

My main concern is will all of the old devices still successfully connect using the old domain? I do know I can update the domain in each devices config but am worried about any devices that are offline and I cannot change their domain before doing change over.

Is there anything else I am missing? I might setup a second openBalena server - connect a device and trial this before migrating my main server.

Cheers,
Chris

Hi

I don’t think it’s possible because there’s no way to tell the devices the domain has changed.
What I did, was setup another server with the new domain and then migrated the devices. (see balena join)

Hi @Maurits

This make sense.

I guess the concern I have is that there are some devices that are offline for quite some time and then connect every few months.

To move a device it must be connected to the server we are moving it from.

I am wondering if there is some way to put a redirect on the old domain to the new one and have the device follow that to the new server.

The next issue is it wont be able to connect to the new server as it isn’t registered with it.

I am going to try preregistering and offline device. Putting redirect in place. Booting device and see if it connects.

Cheers,
Chris

@dash we have done this a few times, and in all cases we have had to have both openbalena environments (source and target) and running during the switch, and the source needs to remain online until all devices connect and cut over.

In our cases we did it with OS update hooks (we run our own version of balena OS which makes this a bit easier) but we have also manually cut devices over as well using automated scripts. The essence of the cutover process is just updating config.json to point to the new API server. There is a service that watches config.json and will apply the change, reconfigure the VPN and restart the supervisor.

You’ll also want to make sure the new balena environment had a mirror of the old database, registry etc. Otherwise you will have to manually register the device. If you prefer to start clean, you can automate that in a shell script as well, but you’ll want to make sure that if you get a new UUID or API key that you apply those to config.json in addition to the API server.

If you find a way to do this with domain redirection, I would be interested to know about it. I suspect you will run into issues with certificates if you do that, but perhaps there is a way to get around that.

Hi @drcnyc,

Could you help me remember the config.json update process?

The essence of the cutover process is just updating config.json to point to the new API server. There is a service that watches config.json and will apply the change, reconfigure the VPN and restart the supervisor.

If I’m not mistaken, the only way to update the config.json (without using balena join from the CLI) is by SSHing into the device and running a command I can’t recall :frowning: which was something like update_command "config.json content on a single line". For the config.json, I’d use the one I injected into the image before flashing the device (which adds keys after provisioning), but I have a doubt about an apiKey that’s present in this file but not in the provisioned one, for which I can’t find any documentation.

Is the mirror of the old registry database to avoid having to redeploy the application?

You’ll also want to make sure the new balena environment had a mirror of the old database, registry etc. Otherwise you will have to manually register the device. If you prefer to start clean, you can automate that in a shell script as well, but you’ll want to make sure that if you get a new UUID or API key that you apply those to config.json in addition to the API server.

I’m also interested in this scenario as I’m currently dealing with this problem: Build on Internet-Connected Machine and Deploying to Internet-Isolated Environment (in which I believe I’ve made some progress), but the device I tried to move with the wizard using the balena join CLI isn’t downloading the application. And it seems to me that it’s not properly moved even if it says “operational”, so I wanted to try moving it by injecting the config.json from the device.

Thanks

I found (i think) the right command
os-config-join '{...my_json..}'

Fetching service configuration from https://api…/os/v1/config…
Service configuration retrieved
No configuration changes
Stopping balena-supervisor.service…
Awaiting balena-supervisor.service to exit…
Writing /mnt/boot/config.json
Starting balena-supervisor.service…

And now (finally) i have different error on supervisor

journalctl --unit=balena-supervisor --unit=resin-supervisor

Applying target state
Jun 10 10:14:48 89a1389 balena-supervisor[3973]: [event] Event: Image removal {“image”:{“name”:“registry2.old-domain.local/v2/5905ba86aa8f7c5c69e45e1e93c8a46f@sha256:0d948123f54615c25eef4387148ca5f694adae03eef92aa4f663075ca5f3cdb9”,“appId”:1,“serviceId”:1,“serviceName”:“node-red”,“imageId”:2,“releaseId”:2,“appUuid”:“67ef801964b84959a94343ee076b178c”,“commit”:“29dc355312db4c630dcc9e64e66946db”}}
Jun 10 10:14:48 89a1389 balena-supervisor[3973]: [event] Event: Image removal error {“error”:{“message”:"(HTTP code 409) conflict - conflict: unable to remove repository reference "registry2.old-domain.local/v2/5905ba86aa8f7c5c69e45e1e93c8a46f:latest" (must force) - container 7e053f4f51e8 is using its referenced image 7f4c19191fa1 ",“stack”:“Error: (HTTP code 409) conflict - conflict: unable to remove repository reference "registry2.old-domain.local/v2/5905ba86aa8f7c5c69e45e1e93c8a46f:latest" (must force) - container 7e053f4f51e8 is using its referenced image 7f4c19191fa1 \n at /usr/src/app/dist/app.js:2:673868\n at /usr/src/app/dist/app.js:2:673686\n at Modem.buildPayload (/usr/src/app/dist/app.js:2:673793)\n at IncomingMessage. (/usr/src/app/dist/app.js:2:672771)\n at IncomingMessage.emit (node:events:531:35)\n at endReadableNT (node:internal/streams/readable:1696:12)\n at process.processTicksAndRejections (node:internal/process/task_queues:82:21)”},“image”:{“name”:“registry2.old-domain.local/v2/5905ba86aa8f7c5c69e45e1e93c8a46f@sha256:0d948123f54615c25eef4387148ca5f694adae03eef92aa4f663075ca5f3cdb9”,“appId”:1,“serviceId”:1,“serviceName”:“node-red”,“imageId”:2,“releaseId”:2,“appUuid”:“67ef801964b84959a94343ee076b178c”,“commit”:“29dc355312db4c630dcc9e64e66946db”}}

...conflict: unable to remove repository reference... to old-domain

How can i solve it ?

Thanks
Andrea

I stopped the container, deleted it, and removed its corresponding image as follows:

root@89a1389:~# balena container list
CONTAINER ID   IMAGE                                                            COMMAND                  CREATED      STATUS                    PORTS     NAMES
6d8fd99dd758   registry2.balena-cloud.com/v2/66f0a291f1d4171bb94352cc9557f8bc   "/usr/src/app/entry.…"   3 days ago   Up 31 minutes (healthy)             balena_supervisor
7e053f4f51e8   7f4c19191fa1                                                     "/usr/bin/entry.sh b…"   3 days ago   Up 31 minutes                       node-red_2_2_29dc355312db4c630dcc9e64e66946db

root@89a1389:~# balena container rm 7e053f4f51e8
Error response from daemon: You cannot remove a running container 7e053f4f51e80e2cc06826e160b6a2339c132c167277c586f14ca57485fd9d49. Stop the container before attempting removal or force remove

root@89a1389:~# balena container stop 7e053f4f51e8
root@89a1389:~# balena container rm 7e053f4f51e8
root@89a1389:~# balena container list
CONTAINER ID   IMAGE                                                            COMMAND                  CREATED      STATUS                    PORTS     NAMES
6d8fd99dd758   registry2.balena-cloud.com/v2/66f0a291f1d4171bb94352cc9557f8bc   "/usr/src/app/entry.…"   3 days ago   Up 33 minutes (healthy)             balena_supervisor
root@89a1389:~# balena image list
REPOSITORY                                                           TAG        IMAGE ID       CREATED        SIZE
registry2.old-domain.local/v2/5905ba86aa8f7c5c69e45e1e93c8a46f   latest     7f4c19191fa1   6 days ago     844MB
balena_supervisor                                                    v16.10.1   97aee429b5df   6 months ago   112MB
registry2.balena-cloud.com/v2/66f0a291f1d4171bb94352cc9557f8bc       latest     97aee429b5df   6 months ago   112MB
root@89a1389:~# balena container rm 7e053f4f51e8
7e053f4f51e8
root@89a1389:~# balena image rm 7f4c19191fa1
[some layer deleted]

Now the error I’d seen in the past has returned, which is:

Jun 10 10:41:10 89a1389 balena-supervisor[3973]: [api]     GET /v1/healthy 200 - 6.162 ms
Jun 10 10:43:09 89a1389 balena-supervisor[3973]: [info]    Reported current state to the cloud
Jun 10 10:43:21 89a1389 balena-supervisor[3973]: [info]    Applying target state
Jun 10 10:43:21 89a1389 balena-supervisor[3973]: [event]   Event: Docker image download {"image":{"name":"registry2.new-domain.local/v2/5cd9eb7b19e96f6658726f26fe30f14c@sha256:6878587e36cfd28bc337c2fe9a19eff1ef9d9ee823cf81d351910a8b70695738","appId":1,"appUuid":"1f5dce674dbb47f49f987a60365a885e","serviceId":1,"serviceName":"main","imageId":11,"releaseId":11,"commit":"c09a84612c324396c2cb981f7beaf57d"}}
Jun 10 10:43:22 89a1389 balena-supervisor[3973]: [error]   Device state report failure! Status code: 401 - message:
Jun 10 10:43:22 89a1389 balena-supervisor[3973]: [info]    Retrying current state report in 15 seconds
Jun 10 10:43:37 89a1389 balena-supervisor[3973]: [error]   Device state report failure! Status code: 401 - message:
Jun 10 10:43:37 89a1389 balena-supervisor[3973]: [info]    Retrying current state report in 30 seconds
Jun 10 10:44:08 89a1389 balena-supervisor[3973]: [error]   Device state report failure! Status code: 401 - message:
Jun 10 10:44:08 89a1389 balena-supervisor[3973]: [info]    Retrying current state report in 60 seconds
Jun 10 10:44:23 89a1389 balena-supervisor[3973]: [event]   Event: Image downloaded {"image":{"name":"registry2.new-domain.local/v2/5cd9eb7b19e96f6658726f26fe30f14c@sha256:6878587e36cfd28bc337c2fe9a19eff1ef9d9ee823cf81d351910a8b70695738","appId":1,"appUuid":"1f5dce674dbb47f49f987a60365a885e","serviceId":1,"serviceName":"main","imageId":11,"releaseId":11,"commit":"c09a84612c324396c2cb981f7beaf57d"}}
Jun 10 10:44:23 89a1389 balena-supervisor[3973]: [event]   Event: Take update locks {"appId":"1","force":false,"services":["main"]}
Jun 10 10:44:24 89a1389 balena-supervisor[3973]: [event]   Event: Service install {"service":{"appId":1,"serviceId":1,"serviceName":"main","commit":"c09a84612c324396c2cb981f7beaf57d","releaseId":11}}
Jun 10 10:44:24 89a1389 balena-supervisor[3973]: [error]   Scheduling another update attempt in 600000ms due to failure:  Error: Failed to apply state transition steps. (HTTP code 400) bad parameter - No command specified  Steps:["start"]
Jun 10 10:44:24 89a1389 balena-supervisor[3973]: [error]         at /usr/src/app/dist/app.js:10:12191
Jun 10 10:44:24 89a1389 balena-supervisor[3973]: [error]       at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
Jun 10 10:44:24 89a1389 balena-supervisor[3973]: [error]   Device state apply error Error: Failed to apply state transition steps. (HTTP code 400) bad parameter - No command specified  Steps:["start"]
Jun 10 10:44:24 89a1389 balena-supervisor[3973]: [error]         at /usr/src/app/dist/app.js:10:12191
Jun 10 10:44:24 89a1389 balena-supervisor[3973]: [error]       at process.processTicksAndRejections (node:internal/process/task_queues:95:5)

So I’m stuck again, not understanding how to get out of this. Did I do something wrong in the procedure?

The commit is correct c09a84612c324396c2cb981f7beaf57d (and matches the one in the registry), but the “main” service is wrong. On the working device, “node-red” appears, which is the correct service name. Why is it trying to start the wrong service?

ID COMMIT                           CREATED AT               STATUS  SEMVER IS FINAL
11 c09a84612c324396c2cb981f7beaf57d 2025-06-06T15:20:40.142Z success 0.0.0  true

Thanks