After having changed the certificates we were again able to do the login, but it now seems like all the devices don’t change their status (online/offline) and it’s then impossible to tunnel/monitoring online status
We did a snapshot of our machine before changing the configuration, so we should be able to perform a drawback in case.
If you were able to get into the devices locally, via SSH (via preloaded key), you could expedite this process by running /usr/bin/os-config update manually. If this isn’t possible, your devices should reconnect to the VPN over the next 24 hours after they update their local OpenVPN configuration (it’s basically /etc/openvpn/ca.crt that is different).
How did you generate the new certs and how did you apply them to the instance? Are you sure you correctly updated DEVICE_CONFIG_OPENVPN_CA in the environment section of the api service of your compose YAML?
There’s nothing in that thread about updating DEVICE_CONFIG_OPENVPN_CA…
Just to be sure, I should take the ca.crt in the /open-balena/config/certs/vpn directory and update the OPENBALENA_VPN_CA_CHAIN variable in the activate file?
Another thing which is odd is that I can correctly hit each device, even if flagged as ‘offline’, asking for their logs and get their logs without any kind of error…
It seems like just the IS ONLINE flag is not updated correctly
These environment variables are passed to quite a number of containers, so make sure to perform a correct search replace through the entire composition before redeploying it.
Once the devices and backends all have the correct base64 encoded environment variables and have been restarted to reflect, ensure that the VPN and API services in the backend are starting correctly and then on the device(s), ensure that curl https://api.<my.domain>/ping works (without supplying -k) as well as check the output of journalctl -u openvpn` to ensure the VPN client was able to authenticate,
Note, if multiple CAs are used (e.g. root CA => server CA), ca.pem must be a concatenated bundle of both the root CA certificate and the intermediate/server CAs.
At the moment curl https://api.<my.domain>/ping doesn’t work and the output of journalctl -u openvpn is:
NOTE: the current --script-security setting may allow this configuration to call user-defined scripts
TCP/UDP: Preserving recently used remote address: [AF_INET]<myserverip>:443
Socket Buffers: R=[87380->87380] S=[16384->16384]
Attempting to establish TCP connection with [AF_INET]<myserverip>:443 [nonblock]
TCP connection established with [AF_INET]<myserverip>:443
TCP_CLIENT link local: (not bound)
TCP_CLIENT link remote: [AF_INET]<myserverip>.56:443
Connection reset, restarting [0]
SIGUSR1[soft,connection-reset] received, process restarting
Restart pause, 120 second(s)
We controlled all the certificate as you suggested an everything was correctly encoded/signed.
Then we tried to run again the quickstart script in local and we found out that also the JWT certificate where expired and that the quickstart script didn’t updates the CA certificates, but just the expired one.
We then try to copy all the new certificate/env vars on our production server, relaunch it and now it seems that everything started again.
Good to know you got it working. We don’t currently perform expiry validation on the JOSE/JWT cert, so you could leave it as is. Updating is is also fine and probably a good practice anyway, since when we start checking for expiry, your config will still be valid.