open balena api doesn't seems to recover cert modification

Hi!

Two days ago our certificates of VPN/Root on open balena expired, while trying to do the:
balena login command we got this error:

CERT_HAS_EXPIRED: request to https://api.<mydomain>/login_ failed, reason: certificate has expired

we followed this thread VPN Certs seems to be expired - #4 by wolf_karl
and changed the certificate on both the vpn and root directory.

After having changed the certificates we were again able to do the login, but it now seems like all the devices don’t change their status (online/offline) and it’s then impossible to tunnel/monitoring online status

We did a snapshot of our machine before changing the configuration, so we should be able to perform a drawback in case.

Any idea on what we could try?

We are using open balena version 2.0.0

Thanks Matteo

Hi Matteo, balenaOS is configured to poll for configuration changes from the backend every 24 hours, via a os-config.timer|service:

# systemctl cat os-config.timer
# /lib/systemd/system/os-config.timer
[Unit]
Description=Periodic check for configuration changes

[Timer]
OnCalendar=daily

[Install]
WantedBy=multi-user.target
root@2d2d320:~# 

# systemctl cat os-config.service
# /lib/systemd/system/os-config.service
[Unit]
Description=OS configuration update service
Requires=resin-boot.service
Wants=os-config-devicekey.service NetworkManager.service
After=os-config-devicekey.service resin-boot.service NetworkManager.service

[Service]
Type=simple
Restart=on-failure
RestartSec=10
ExecStart=/usr/bin/os-config update

[Install]
WantedBy=multi-user.target

If you were able to get into the devices locally, via SSH (via preloaded key), you could expedite this process by running /usr/bin/os-config update manually. If this isn’t possible, your devices should reconnect to the VPN over the next 24 hours after they update their local OpenVPN configuration (it’s basically /etc/openvpn/ca.crt that is different).

Hope this helps.

Hi!
Thanks for the answer, I’ve run the command:
/usr/bin/os-config update
And got this output

Fetching service configuration from https://api.<my.domain>/os/v1/config...
Service configuration retrieved
No configuration changes

But still the device has the flag ‘is_online’ set as false…

It’s been more then 24 hours now and it seems like that any of the devices aren’t able to change this flag and then are unreachable

Thanks

Matteo

How did you generate the new certs and how did you apply them to the instance? Are you sure you correctly updated DEVICE_CONFIG_OPENVPN_CA in the environment section of the api service of your compose YAML?

I’ve followed the step in this thread (VPN Certs seems to be expired - #10 by wolf_karl) and manually create certificates for the vpn and root.

There’s nothing in that thread about updating DEVICE_CONFIG_OPENVPN_CA

Just to be sure, I should take the ca.crt in the /open-balena/config/certs/vpn directory and update the OPENBALENA_VPN_CA_CHAIN variable in the activate file?

Thanks again

I’ve checked and we didn’t modify any of the CA certificates because those are not expired…

from the /os/v1/config command we get the corret ca, but still no one of the device is updating their online/offline status…

Any other suggestion? Maybe slowly upgrading each version of open balena may help?

Thanks

Matteo

Another thing which is odd is that I can correctly hit each device, even if flagged as ‘offline’, asking for their logs and get their logs without any kind of error…

It seems like just the IS ONLINE flag is not updated correctly

Hello, the complete set of environment variables that need to be updated after certificates are generated (excl. the API JOSE/JWT auth.):

ROOT_CA base64 encoded ca.pem (or bundle)
DEVICE_CONFIG_OPENVPN_CA base64 encoded ca.pem (or bundle)
VPN_OPENVPN_SERVER_KEY base64 encoded vpn-server.key
VPN_OPENVPN_SERVER_CRT base64 encoded vpn-server.pem (signed by ca.pem)
VPN_OPENVPN_CA_CRT base64 encoded vpn-server.pem (signed by ca.pem)

These environment variables are passed to quite a number of containers, so make sure to perform a correct search replace through the entire composition before redeploying it.

Once the devices and backends all have the correct base64 encoded environment variables and have been restarted to reflect, ensure that the VPN and API services in the backend are starting correctly and then on the device(s), ensure that curl https://api.<my.domain>/ping works (without supplying -k) as well as check the output of journalctl -u openvpn` to ensure the VPN client was able to authenticate,

Note, if multiple CAs are used (e.g. root CA => server CA), ca.pem must be a concatenated bundle of both the root CA certificate and the intermediate/server CAs.

Hello,

why didn’t change the CA so I think that these environment variables

ROOT_CA
DEVICE_CONFIG_OPENVPN_CA
VPN_OPENVPN_CA_CRT

should be the same

I’ve encoded VPN_OPENVPN_SERVER_KEY and VPN_OPENVPN_SERVER_CRT with these commands:

echo "$(cat ./vpn/private/vpn.<server-domain>.key)" | base64 --wrap=0 2>/dev/nul
echo "$(cat ./vpn/issued/vpn.<server-domain>.crt)" | base64 --wrap=0 2>/dev/nul

is it the correct way to do it?

At the moment curl https://api.<my.domain>/ping doesn’t work and the output of journalctl -u openvpn is:

 NOTE: the current --script-security setting may allow this configuration to call user-defined scripts
TCP/UDP: Preserving recently used remote address: [AF_INET]<myserverip>:443
Socket Buffers: R=[87380->87380] S=[16384->16384]
Attempting to establish TCP connection with [AF_INET]<myserverip>:443 [nonblock]
TCP connection established with [AF_INET]<myserverip>:443
TCP_CLIENT link local: (not bound)
TCP_CLIENT link remote: [AF_INET]<myserverip>.56:443
Connection reset, restarting [0]
SIGUSR1[soft,connection-reset] received, process restarting
Restart pause, 120 second(s)

In this thread (VPN Certs seems to be expired - #10 by wolf_karl) they suggest to re run the quickstart script somewhere else and then copy all the new certs, do you think it would work?

Do you think that slowly upgrading the open-balena version to a more recent one would help?

Thanks again for the help, if you’re coming to Milan a beer is on me! :grin:

We use openssl to encode into a single base64 string:

cat < {{file}} | openssl base64 -A

Can you drop all of the PKI assets somewhere on your system and run the verification, to make sure everything matches:

    # compare public keys
    openssl x509 -in ca.pem -pubkey
    openssl pkey -in ca-key.pem -pubout

    # view certificate
    openssl x509 -in ca.pem -text

    # compare public keys
    openssl x509 -in vpn-cert.pem -pubkey
    openssl pkey -in vpn-key.pem -pubout

    # view certificate
    openssl x509 -in vpn-cert.pem -text
    
    # (this is the most important) verify
    openssl verify -verbose -CAfile ca.pem vpn-cert.pem

If you can’t access the API, that most likely means the ROOT_CA isn’t present on the system(s).

Hi Anton,

We controlled all the certificate as you suggested an everything was correctly encoded/signed.

Then we tried to run again the quickstart script in local and we found out that also the JWT certificate where expired and that the quickstart script didn’t updates the CA certificates, but just the expired one.

We then try to copy all the new certificate/env vars on our production server, relaunch it and now it seems that everything started again.

Thanks again for all the help

Matteo