Just in the last couple of days though I have tried to ssh any of the devices I am running and receiving the following error:
$ ssh root@d6a20843efed99bbe56f813ac4b797e2.balena
Via xxx.xxx.xxx.xxx:3128 → d6a20843efed99bbe56f813ac4b797e2.balena:22222
analyze_HTTP: readline failed: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
All of the devices show online status and in the logs I can see them making request to the API regularly.
What is interesting is that when looking at the certificate presented by the API end it was automatically renewed earlier this week which makes suspect there is some sort of certificate error occurring with the VPN.
This is also interesting as Letsencrypt recently added a new ROOT CA - I’m wondering if perhaps these devices do not have this Root CA installed in their base OS?
In short has anyone else experienced this or have any advice on how I might be able to debug VPN/SSH to identify the root cause.
vpn.xxxx.com:3128 - This is the VPN port I believe? Should this have SSL? - result is:
CONNECTED(00000005)
140263854088640:error:1408F10B:SSL routines:ssl3_get_record:wrong version number:…/ssl/record/ssl3_record.c:332:
no peer certificate available
No client certificate CA names sent
SSL handshake has read 5 bytes and written 327 bytes
Verification: OK
New, (NONE), Cipher is (NONE)
Secure Renegotiation IS NOT supported
Compression: NONE
Expansion: NONE
No ALPN negotiated
Early data was not sent
Verify return code: 0 (ok)
Provisioned new device from the same image as was used for most recently provisioned devices (intel-nuc 2.50.1). This new device is not connecting to either API or VPN nothing specific to it coming through in logs.
My intention now is to try provision a new device with 2.58.6 x86 Generic image and connect it to this openBalena server and see if it will connect.
Any guidance on how to debug this would be much appreciated.
Sorry for the spam. Just trying to share all the information I have and steps I have tried.
So I have setup a new device with dev BalenaOS version. It isn’t connected to the openBalena server (its status is offline).
When I ssh into it locally and look at the logs for supervisor I see this:
root@00cc9cf:~# balena logs resin_supervisor --tail 10000 -f
[api] GET /v1/healthy 200 - 12.964 ms
[api] GET /v1/healthy 200 - 1.319 ms
[debug] Attempting container log timestamp flush…
[debug] Container log timestamp flush complete
WARNING: ca-certificates.crt does not contain exactly one certificate or CRL: skipping
Warning: Ignoring extra certs from /etc/ssl/certs/balenaRootCA.pem, load failed: error:02001002:system library:fopen:No such file or directory
[success] Device state apply success
This would suggest it is connecting to API and updating state correctly however its state remains offline when query device with balena cli.
Looking in config.json on the image that was used to provision this device ‘balenaRootCA’ is clearly set.
Also checking /mnt/boot/config.json balenaRootCA is clearly set on this device as well.
I can also tail the logs for openvpn on this device and can see this:
root@00cc9cf:/resin-boot# systemctl status openvpn.service
openvpn.service - OpenVPN
Loaded: loaded (/lib/systemd/system/openvpn.service; enabled; vendor preset: enabled)
Active: active (running) since Sun 2021-05-02 10:16:48 UTC; 36min ago
Main PID: 1150 (openvpn)
Tasks: 1 (limit: 2358)
Memory: 2.1M
CGroup: /system.slice/openvpn.service
└─1150 /usr/sbin/openvpn --writepid /run/openvpn/openvpn.pid --cd /etc/openvpn/ --config /etc/openvpn/openvpn.conf --connect-retry 5 120
May 02 10:52:05 00cc9cf openvpn[1150]: Sun May 2 10:52:05 2021 TLS: Initial packet from [AF_INET]3.104.60.33:443, sid=9aa02bbf c23f58b3
May 02 10:52:05 00cc9cf openvpn[1150]: Sun May 2 10:52:05 2021 VERIFY OK: depth=1, CN=vpn-ca.mydomain.com
May 02 10:52:05 00cc9cf openvpn[1150]: Sun May 2 10:52:05 2021 VERIFY ERROR: depth=0, error=certificate has expired: CN=vpn.mydomain.com
May 02 10:52:05 00cc9cf openvpn[1150]: Sun May 2 10:52:05 2021 OpenSSL: error:1416F086:SSL routines:tls_process_server_certificate:certificate verify failed
May 02 10:52:05 00cc9cf openvpn[1150]: Sun May 2 10:52:05 2021 TLS_ERROR: BIO read tls_read_plaintext error
May 02 10:52:05 00cc9cf openvpn[1150]: Sun May 2 10:52:05 2021 TLS Error: TLS object → incoming plaintext read error
May 02 10:52:05 00cc9cf openvpn[1150]: Sun May 2 10:52:05 2021 TLS Error: TLS handshake failed
May 02 10:52:05 00cc9cf openvpn[1150]: Sun May 2 10:52:05 2021 Fatal TLS error (check_tls_errors_co), restarting
May 02 10:52:05 00cc9cf openvpn[1150]: Sun May 2 10:52:05 2021 SIGUSR1[soft,tls-error] received, process restarting
May 02 10:52:05 00cc9cf openvpn[1150]: Sun May 2 10:52:05 2021 Restart pause, 120 second(s)
root@00cc9cf:/resin-boot#
Seems to be the smoking gun re: vpn certificate expiry being the issue, will see if I can work out how to renew safely at server end.