Devices offline & VPN Error: Cannot load DH parameters from dh.pem

Winkelmann · August 25, 2019, 3:45pm

Hey,
Thanks for this amazing open source project.

I installed OpenBalena on a DigitalOcean droplet on Ubuntu 19.04, following the Getting Started guide.

Before that, I used to have it installed on a Virtual Machine running in my local network and everything was fine.

I am having an issue that looks similar to this one:

Since installing on the droplet, my devices have always been sown “offline”.
On the droplet, I however used to be able to push updates to them, but not any more.

In case it matters, I have Raspberry Pi Zero W connected via WIFI running balenaOS 2.32.0+rev1.
Also I cannot SSH into them any more, so I can’t take any logs from that side.

Following the mentioned thread, I pulled the logs from the VPN, using ./scripts/compose exec vpn journalctl -fn100, and they revealed a few errors:

vpn-logs.log (11.9 KB)

I’ll quote one of them here to make this thread easier to find, but please take a look at the full log above.

WARNING: POTENTIALLY DANGEROUS OPTION --verify-client-cert none|optional (or --client-cert-not-required) may accept clients which do not present a certificate
WARNING: file 'server.key' is group or others accessible
OpenVPN 2.4.0 x86_64-pc-linux-gnu [SSL (OpenSSL)] [LZO] [LZ4] [EPOLL] [PKCS11] [MH/PKTINFO] [AEAD] built on Oct 14 2018
library versions: OpenSSL 1.0.2r  26 Feb 2019, LZO 2.08
NOTE: the current --script-security setting may allow this configuration to call user-defined scripts
OpenSSL: error:0906D06C:PEM routines:PEM_read_bio:no start line
Cannot load DH parameters from dh.pem
Exiting due to fatal error

I’m an IT guy, but I don’t know much about Docker, and I certainly have no idea how to go about “fixing” this issue.
Thanks in advance for your help. I can provide any other required logs or info, just ask.

Tim

dfunckt · August 26, 2019, 7:51am

How did you move the RPi to the DigitalOcean openBalena instance? Did you re-provision it? The logs from the VPN service look scary and something seems wrong with the certificates. Can you try reprovisioning the instance from scratch? I wonder if something went silently wrong during setup.

Winkelmann · August 26, 2019, 11:43am

I’m not sure what you mean by “move the RPi”.

If you mean moved it from my VM instance to the DigitalOcean instance, then I re-flashed the SD card with a new OS configured with the DigitalOcean instance. All good there.

If it has something to do with certificates, I would like to take the opportunity to make a Let’s Encrypt cert. I saw that it was possible:

However, I do not know how it works exactly and I don’t think it is mentioned in the docs or the getting started.

So I’m going to re-install the project from zero on my server, but first can you explain how to enable the Let’s Encrypt cert ?

Thank you very much.

EDIT: I’ll try re-installing using the “-c” option

github.com

balena-io/open-balena/blob/master/scripts/quickstart#L45


      
          DIR="$(dirname "${CMD}")"
          BASE_DIR="$(dirname "${DIR}")"
          CONFIG_DIR="${BASE_DIR}/config"
          CERTS_DIR="${CONFIG_DIR}/certs"
          
          DOMAIN=openbalena.local
          
          usage() {
            echo "usage: $0 [-c] [-h] [-p] [-d DOMAIN] -U EMAIL -P PASSWORD"
            echo
            echo "  -c           enable the ACME certificate service in staging or production mode."
            echo "  -p           patch hosts - patch the host /etc/hosts file"
            echo "  -d DOMAIN    the domain name this deployment will run as, eg. example.com. Default is 'openbalena.local'"
            echo "  -U EMAIL     the email address of the superuser account, used to login to your install from the Balena CLI"
            echo "  -P PASSWORD  the password to use for the superuser account."
            echo
          }
          
          show_help=false
          patch_hosts=false
          while getopts ":chpxd:U:P:" opt; do

Winkelmann · August 26, 2019, 12:31pm

[Note: I reinstalled once]
Just adding “-c” when running the quickstart script also generates a self signed cert. No difference.
I must be missing something.

[Note: I reinstalled again]
Things keep getting weirder.
I took down the containers and deleted all balena files, then re-installed and re-configured again.
I decided to use a different email as the login this time.

I could not login to the instance using the CLI, I would get BalenaRequestError: Request error: Unauthorized.

However, using the credentials of the old instance, I am able to login into the new one.
So that means I am not deleting everything correctly, or that the quickstart script doesn’t work properly.

How do you go about doing a clean install ?

Thanks in advance

richbayliss · August 26, 2019, 12:34pm

Hey.

I would backup, and remove your ./config directory as this holds the values from the quickstart script output. The -c flag is the one you want, and you’re correct that it will still create a self-signed cert, BUT at runtime the cert is swapped out since the ACME protocol has to run at runtime to generate the cert. You should see the cert-manager container doing this if you check the logs

Note: Make sure that your DNS is setup correctly before trying this, as ACME will fail to provide a cert if you do not have your DNS pointing to the server.

Winkelmann · August 26, 2019, 12:37pm

Hey,

my DNS is ok, so I will try again and look at the logs.

In the help, it talks about a “production mode” is there a production flag somewhere in this ?
-c enable the ACME certificate service in staging or production mode.

What config folder are you talking about ? I deleted everything in the root directory cloned from git and cloned anew.

richbayliss · August 26, 2019, 1:00pm

The cert-manager container will try and get you a staging cert first and if this is successful then it will acquire the production one. LetsEncrypt will rate limit cert acquisition of production/trusted certs so we do a check on their staging server first to make sure a cert could be acquired.

The ./config folder is created in the root directory of the repo after running ./quickstart - so if you have a fresh directory then you should be fine.

Winkelmann · August 26, 2019, 3:13pm

I only deleted the config folder this time, and used the “-c” option.

Running ./scripts/compose exec cert-provider journalctl -fn100 returns:

OCI runtime exec failed: exec failed: container_linux.go:345: starting container process caused "exec: \"journalctl\": executable file not found in $PATH": unknown

Running ./scripts/compose exec vpn journalctl -fn100 returns:
new-vpn-logs.log (13.2 KB)

I can still see a few warning and errors in the VPN.

Trying to connect with the CLI still gives me the SELF_SIGNED_CERT_IN_CHAIN: request to https://api.{{mydomain}}/login_ failed, reason: self signed certificate in certificate chain after waiting for a while.

Using the self signed cert to connect reveals that I cannot connect with the newly configured email address and password, only the old one.
That’s not normal, but right now it’s not the main issue I suppose.

At least that means my Apps and devices still exist, even though I clearly deleted everything multiple times… Very strange.
Should I have restarted the server before installing again ?

Anyways. The device is still shown offline, hasn’t received any logs recently, and creating a new release doesn’t push it to the device.

Nothing has changed with the reinstall, expect the VPN error isn’t as scary as before.

richbayliss · August 26, 2019, 3:31pm

@Winkelmann

The logs for the cert-provider container are just docker logs, not journalctl - so ./scripts/compose logs cert-provider should show you what’s happening there.

The devices will show as offline if they cannot bring up the VPN, so you could see what’s going on in the device supervisor by running journalctl -u resin-supervisor -fn100 in the device’ terminal (SSH port 22222 if you have used a development balenaOS image)

Winkelmann · August 26, 2019, 4:07pm

Okay, for the cert ./scripts/compose logs cert-provider returns

Attaching to openbalena_cert-provider_1
cert-provider_1  | [Info] VALIDATION not set. Using default: http-01
cert-provider_1  | [Info] Waiting for api.iot.meters.ch to be available via HTTP...
cert-provider_1  | (1/6) Retrying in 5 seconds...
cert-provider_1  | (2/6) Retrying in 5 seconds...
cert-provider_1  | (3/6) Retrying in 5 seconds...
cert-provider_1  | (4/6) Retrying in 5 seconds...
cert-provider_1  | (5/6) Retrying in 5 seconds...
cert-provider_1  | (6/6) Retrying in 5 seconds...
cert-provider_1  | [Error] Unable to access api.{{mydomain}} on port 80. This is needed for certificate validation. [Stopping]

Should I be the one doing something to make api.{{mydomain}} accessible or is it automatically done by OpenBalena ?

As for the device logs, sadly, since everything was working fine, I switched to a production image a while ago.
Is it possible to flash my SD card with the dev image while retaining the data partitions ?

Thanks for your help.

thgreasi · August 26, 2019, 4:41pm

No, unfortunately you can re-flash an SD card with a dev image and retain the data partitions.

Winkelmann · August 26, 2019, 4:43pm

Not sure what you meant. “No, unfortunately you cannot re-flash” ? Typo ?

And what about the certificate ? Is there something I’m missing, or should OpenBalena be able to do it on it’s own and it’s a bug ?

richbayliss · August 26, 2019, 4:50pm

So the message [Error] Unable to access api.openbalena.richbayliss.dev on port 80. This is needed for certificate validation. [Stopping] happens when the API container is not reachable when the cert-provider container is started. I found this to be the case when the images have to be pulled, so I had to issue a ./scripts/compose restart cert-provider to make it try the cert acquisition again.

You need to make sure api.{openbalena domain} is pingable, and resolving to the machines IP address, but otherwise the application stack will route the requests as required

I have also confirmed that the stack works on Digital Ocean with Ubuntu 19.04 and the latest Docker-CE – so your setup should be fine.

thgreasi · August 26, 2019, 4:52pm

Sorry about the typo. As you correct me, you cannot re-flash an SD card and preserve the data partitions.

Winkelmann · August 26, 2019, 4:56pm

Hey,

Thank you very much @richbayliss . Restarting the cert-provider fixed the certificate.
I now have a beautiful Let’s Encrypt certificate.

Obviously I would recommend talking about the “-c” option of the quickstart script, and the restarting of the container in the Getting Started Page for OpenBalena

As for the other (main, I guess) issue, I suppose I will reflash the RPi with a dev image and see from there.
It might be a problem with the device itself, or my software on it that made it crash… Don’t know…
Thanks @thgreasi for the answer.

I’ll keep you updated here.

richbayliss · August 26, 2019, 5:04pm

@Winkelmann please report back here with your experience Glad it’s working, and I will be putting your feedback into our development loop

Winkelmann · August 26, 2019, 7:13pm

The new device works perfectly. Shows Online, I can SSH into it and it received the latest version of my software.

Only minor issue left: the old device has stayed in the list as a ghost. I thought it would get replaced, since the new device has the same Mac address, same IP address, and now same name.

The is a “leave” command to the CLI, but that works by IP.
If you know a way to remove a device from the list, I’m up for it.

Thanks again

pdcastro · August 26, 2019, 7:38pm

If you know a way to remove a device from the list, I’m up for it.

To confirm, do you mean the list of devices that belong to an application in its web dashboard? If so, indeed I think devices are identified purely by UUID (not MAC address, IP and the like), and re-flashing generates a new UUID, so the old UUID will stay in the list until the device is deleted from the balena API’s database.

The balena leave command (CLI) indeed will not delete the device from the list, but I think the balena device rm <uuid> command will do the trick:

$ balena device rm 28a734d6cdcebe0b1051ab476f503df0
? Are you sure you want to delete the device? Yes

Winkelmann · August 26, 2019, 7:41pm

balena device rm <uuid>

… was the command I was lokking for, thanks

I tried balena help device but that didn’t show me the existance of the rm command, that’s why I asked.

With that, I’m out of problems (for now )

Topic		Replies	Views
Device not online / Production openBalena	35	6264	January 7, 2019
Failing to add device openBalena	13	1965	April 19, 2021
Device registration fails with open balena openBalena	17	1808	June 18, 2020
RaspberryPi3 not connecting to openbalena openBalena raspberrypi3	5	855	December 1, 2021
Raspberry pi 3 not able to join openbalena server Product support raspberrypi3	10	753	September 9, 2021

Devices offline & VPN Error: Cannot load DH parameters from dh.pem

Related topics