Device state not updating after Upgrade and New SSL Cert

Hi guys!

First Off: Thanks for this awesome project!

Problem
I have an issue where the is_online flag is not updating. New devices can join the fleet but the state is always offline (or online if the device was online prior to the upgrade mentioned below).

The device logs feed is working fine, and I can open tunnels, but can’t communicate with the devices.

Backstory:
I upgraded from 4.1.251 to 4.1.349 and created a new ssl certificate using certbot. When I do make up, docker ps often show that the vpn service as unhealthy.

The only time I can manage to get the vpn service to become healthy is when the api fail to run migration "0074 and I run the following:

docker exec -it open-balena-api-1 /bin/bash
rm src/migrations/0074-normalize-release-contract.async.ts 
exit
docker restart open-balena-api-1
docker restart open-balena-vpn-1

And that’s when I encounter the issue of device state not being updated, e.g. is_online never updates. When I upgraded to 4.1.251, I only had to remove the 0074 migration file and restart the services to get up and running. I wonder if this is even related to my issue? :thinking:

Debugging
Digging further into the vpn service when it is in the unhealthy state, I noticed is that it never manages to register a service, thus no worker are spawned.

vpn-1 | [81242.928760] vpn[1163]: debug: [master] registering as service instance...
vpn-1 | [81242.934680] vpn[1163]: (node:1163) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead.
vpn-1 | [81242.934937] vpn[1163]: (Use `node --trace-deprecation ...` to show where the warning was created)
vpn-1 | [82140.494113] systemd[1]: Starting systemd-tmpfiles-clean.service - Cleanup of Temporary Directories...
vpn-1 | [82140.524470] systemd[1]: systemd-tmpfiles-clean.service: Deactivated successfully.
vpn-1 | [82140.525031] systemd[1]: Finished systemd-tmpfiles-clean.service - Cleanup of Temporary Directories.
vpn-1 | [94550.533973] systemd[1]: Starting dpkg-db-backup.service - Daily dpkg database backup service...
vpn-1 | [94550.714038] systemd[1]: dpkg-db-backup.service: Deactivated successfully.
vpn-1 | [94550.714693] systemd[1]: Finished dpkg-db-backup.service - Daily dpkg database backup service.
end of logs

However, when I manage to get the vpn service healthy, I noticed that the API Authentication fails…

vpn-1 | [132593.424354] vpn[1323]: >BYTECOUNT_CLI:94,16718,18193
vpn-1 | [132593.673158] vpn[1323]: >BYTECOUNT_CLI:76,16759,18235
vpn-1 | [132593.940021] vpn[1317]: >BYTECOUNT_CLI:38,16892,18196
vpn-1 | [132594.060913] vpn[1311]: notice: [vpn-1651.2] TCP connection established with [AF_INET]127.0.0.1:50086
vpn-1 | [132594.061050] vpn[1311]: notice: [vpn-1651.2] Socket flags: TCP_NODELAY=1 succeeded
vpn-1 | [132594.061191] vpn[1311]: notice: [vpn-1651.2] 127.0.0.1:50086 TLS: Initial packet from [AF_INET]127.0.0.1:50086, sid=78cdbf7f de2e4b41
vpn-1 | [132594.234766] vpn[1311]: notice: [vpn-1651.2] 127.0.0.1:50086 peer info: IV_VER=2.5.6
vpn-1 | [132594.234944] vpn[1311]: notice: [vpn-1651.2] 127.0.0.1:50086 peer info: IV_PLAT=linux
vpn-1 | [132594.235136] vpn[1311]: notice: [vpn-1651.2] 127.0.0.1:50086 peer info: IV_PROTO=6
vpn-1 | [132594.235301] vpn[1311]: notice: [vpn-1651.2] 127.0.0.1:50086 peer info: IV_NCP=2
vpn-1 | [132594.235529] vpn[1311]: notice: [vpn-1651.2] 127.0.0.1:50086 peer info: IV_CIPHERS=AES-256-GCM:AES-128-GCM
vpn-1 | [132594.235975] vpn[1311]: notice: [vpn-1651.2] 127.0.0.1:50086 peer info: IV_LZ4=1
vpn-1 | [132594.236280] vpn[1311]: notice: [vpn-1651.2] 127.0.0.1:50086 peer info: IV_LZ4v2=1
vpn-1 | [132594.236744] vpn[1311]: notice: [vpn-1651.2] 127.0.0.1:50086 peer info: IV_LZO=1
vpn-1 | [132594.237058] vpn[1311]: notice: [vpn-1651.2] 127.0.0.1:50086 peer info: IV_COMP_STUB=1
vpn-1 | [132594.237570] vpn[1311]: notice: [vpn-1651.2] 127.0.0.1:50086 peer info: IV_COMP_STUBv2=1
vpn-1 | [132594.237824] vpn[1311]: notice: [vpn-1651.2] 127.0.0.1:50086 peer info: IV_TCPNL=1
vpn-1 | [132594.239909] vpn[1311]: notice: [vpn-1651.2] 127.0.0.1:50086 PLUGIN_CALL: POST /etc/openvpn/plugins/openvpn-plugin-auth-script.so/PLUGIN_AUTH_USER_PASS_VERIFY status=2
vpn-1 | [132594.240743] vpn[1311]: notice: [vpn-1651.2] 127.0.0.1:50086 TLS: Username/Password authentication deferred for username 'XXXXXXXXXXXXXXXXXXXXXX' [CN SET]
vpn-1 | [132594.240961] vpn[1311]: notice: [vpn-1651.2] 127.0.0.1:50086 TLS: move_session: dest=TM_ACTIVE src=TM_INITIAL reinit_src=1
vpn-1 | [132594.241193] vpn[1311]: notice: [vpn-1651.2] 127.0.0.1:50086 TLS: tls_multi_process: initial untrusted session promoted to semi-trusted
vpn-1 | [132594.243126] vpn[1158]: info: [vpn-1651] AUTH FAIL: API Authentication failed for XXXXXXXXXXXXXXXXXXXXXX
vpn-1 | [132594.376894] vpn[1311]: notice: [vpn-1651.2] 127.0.0.1:50086 Control Channel: TLSv1.3, cipher TLSv1.3 TLS_AES_256_GCM_SHA384
vpn-1 | [132594.377135] vpn[1311]: notice: [vpn-1651.2] 127.0.0.1:50086 [XXXXXXXXXXXXXXXXXXXXXX] Peer Connection Initiated with [AF_INET]127.0.0.1:50086
vpn-1 | [132594.417524] vpn[1317]: >BYTECOUNT_CLI:84,16728,18239
vpn-1 | [132594.836696] vpn[1317]: >BYTECOUNT_CLI:66,16800,18278

In the latter scenario, the vpn service show that many POST requests to /services/vpn that are made and end up in 401 Unauthorized.

api-1  | [128849.120134] api[1981]: 2025-02-19T09:32:02.493Z ::ffff:172.18.0.6 - POST /services/vpn/client-connect 401 332.442ms -
api-1  | [128850.976363] api[2528]: 2025-02-19T09:32:04.348Z ::ffff:172.18.0.6 - POST /services/vpn/client-connect 401 423.700ms -
api-1  | [128852.176382] api[2528]: 2025-02-19T09:32:05.549Z ::ffff:172.18.0.6 - POST /services/vpn/client-connect 401 141.835ms -
api-1  | [128852.619068] api[1440]: 2025-02-19T09:32:05.990Z ::ffff:172.18.0.6 - POST /services/vpn/client-disconnect 401 327.610ms -
api-1  | [128853.800244] api[2528]: 2025-02-19T09:32:07.173Z ::ffff:172.18.0.6 - POST /services/vpn/client-disconnect 401 33.343ms -
api-1  | [128853.958105] api[1433]: 2025-02-19T09:32:07.331Z ::ffff:172.18.0.6 - POST /services/vpn/client-connect 401 83.899ms -
api-1  | [128855.346156] api[2528]: 2025-02-19T09:32:08.719Z ::ffff:172.18.0.6 - POST /services/vpn/client-disconnect 401 162.849ms -
api-1  | [128855.683467] api[1440]: 2025-02-19T09:32:09.056Z ::ffff:172.18.0.6 - POST /services/vpn/client-connect 401 543.815ms -
api-1  | [128856.832970] api[1981]: 2025-02-19T09:32:10.206Z ::ffff:172.18.0.6 - POST /services/vpn/client-connect 401 39.134ms -
api-1  | [128857.274785] api[1433]: 2025-02-19T09:32:10.647Z ::ffff:172.18.0.6 - POST /services/vpn/client-disconnect 401 202.964ms -
api-1  | [128858.562855] api[2528]: 2025-02-19T09:32:11.936Z ::ffff:172.18.0.6 - POST /services/vpn/client-disconnect 401 69.792ms -
api-1  | [128858.631816] api[1440]: 2025-02-19T09:32:12.004Z ::ffff:172.18.0.6 - POST /services/vpn/client-connect 401 52.167ms -
api-1  | [128860.151599] api[1440]: 2025-02-19T09:32:13.523Z ::ffff:172.18.0.6 - POST /services/vpn/client-disconnect 401 198.500ms -
api-1  | [128860.332140] api[1981]: 2025-02-19T09:32:13.705Z ::ffff:172.18.0.6 - POST /services/vpn/client-connect 401 512.697ms -
api-1  | [128861.591813] api[1433]: 2025-02-19T09:32:14.964Z ::ffff:172.18.0.6 - POST /services/vpn/client-connect 401 145.125ms -
api-1  | [128862.984138] api[1981]: 2025-02-19T09:32:16.357Z ::ffff:172.18.0.6 - POST /services/vpn/client-connect 401 207.834ms -
api-1  | [128864.233235] api[1981]: 2025-02-19T09:32:17.605Z ::ffff:172.18.0.6 - POST /services/vpn/client-connect 401 38.267ms -
api-1  | [128865.641101] api[1440]: 2025-02-19T09:32:19.012Z ::ffff:172.18.0.6 - POST /services/vpn/client-connect 401 289.078ms -
api-1  | [128866.904290] api[1433]: 2025-02-19T09:32:20.277Z ::ffff:172.18.0.6 - POST /services/vpn/client-connect 401 134.937ms -
api-1  | [128868.330889] api[2528]: 2025-02-19T09:32:21.700Z ::ffff:172.18.0.6 - POST /services/vpn/client-connect 401 326.202ms -
api-1  | [128869.926424] api[1981]: 2025-02-19T09:32:23.299Z ::ffff:172.18.0.6 - POST /services/vpn/client-connect 401 565.074ms -

Here is how I created my cert:

sudo certbot certonly --manual --preferred-challenges dns -d domain.io -d *.domain.io
export HAPROXY_CRT=$(sudo cat /etc/letsencrypt/live/domain.io/fullchain.pem | openssl base64 -A) 
export HAPROXY_KEY=$(sudo cat /etc/letsencrypt/live/domain.io/privkey.pem | openssl base64 -A) 
export ROOT_CA=$(sudo awk 'BEGIN {c=0;} /BEGIN CERTIFICATE/{c++} { if(c>1) { print $0;} }' /etc/letsencrypt/live/domain.io/fullchain.pem | openssl base64 -A)
make pki-custom && make verify

Any clue as to why my vpn fails to authenticate in the api service?

Happy to provide any additional info!