I have 4 device working fine using an open WiFi connection. When I try to move to a hardwired network connection (that’s behind our business firewall) the application runs fine but VPN is not available. I have configured custom NTP in config.json, I know that port is blocked. But I would assume the DNS is getting configured via DHCP. Maybe not? I have not tried configuring DNS manually yet. What’s my next step. I have lost remote access and I’m not at the devices location, so I cannot gain access.
Pi4
Host OS version [balenaOS 2.54.2+rev1]
production
Supervisor version 11.12.4
Current release e837802
Hi there, do you happen to have any VPN/online devices in the same L2 network segment as the inaccessible device(s)? We may be able to reach those through the working devices. If so, please grant support access to the devices/apps and let us know the IDs.
In any case, most VPN connectivity issues we see are due to ports/domains/IPs being blocked. We document our network requirements here.
It is also possible for VPN to spot functioning if the machine clock is too far in the past or future, since it will fail the certificate validation step in this case.
Unfortunately I cannot grant access to any devices on this network due to security concerns. I already reviewed the network requirements and I do believe the NTP, DNS and ports are blocked. Port 433 is open. That’s why I configured my own NTP servers in the config.json (but in hindsight now, I used DNS names not IP addresses so if my DNS is not working they won’t be discovered) I also assumed the DNS servers would get picked up from the DHCP server, so my questions is, is that NOT the case? If not, that would explain everything, I think.
I can go back to the device and adjust the config.json with the following and try again. I did not configure sshKeys in the config.json ahead of time, so I don’t think I can get SSH access to the devices, unless there is another way you can advise me on to access the device from the network to make these changes.
Please check you have port 443/tcp (not 433) open. We use HTTPS port to run OpenVPN over for reasons of wider firewall compatibility, etc. Though this protocol doesn’t look like HTTPS on the wire, so if a packet inspection firewall is looking at the traffic, it could be dropping it.
In any case, without access to the device/network, we can only speculate what is going on with the VPN, but if the device is sending heartbeats to the API and you can see logs in the dashboard, that means the networking stack is up, DNS is working and HTTPS traffic (to the API) is getting through.
If the devices don’t have SSH keys section added to the config, the SSH daemon will only be started with the keys accessible via the proxy/VPN flow and if the VPN connection can’t be established for above reasons, then this is obviously not an option.
Your best course of action is to get hold of the device (media if SD card), mount it and update config.json in the boot partition with your SSH key(s); then ssh into the device and do some troubleshooting from the Host OS, like doing DNS lookups, check date/time/NTP sync, etc.
I started again from scratch. Modified my config.json with my NTP servers and SSH Keys. Brought up a new hardwired device and SSH’ed into it. This time the app comes up as before and works as it should, but the device shows “Inactive” and “connecting…” and still no VPN.
I verified it’s getting internal DNS servers from DHCP.
I see my NTP servers that I configured, but it also shows 16 other NTP server it of course cannot reach. Curious, why is it added all these other servers when I specifically specified my own?
I can ping all the domains in the network requirements (balena-cloud.com, docker.com, docker.io)
I can ping and reach public DNS servers, but if I put them in my config.json DNS settings they don’t seem to be configured, only my DHCP servered one.
So you know, I have other Windows and Linux systems that are completely stand-alone (meaning not associated with the network’s domain controller) and that have no problem reaching https servers on the Internet via port 443. So I can confirm port 443 is pretty unrestricted by the firewall.
I am wondering if the issue is possibly that your firewall is doing some DPI or SSL interference. We have seen cases of firewalls which will see traffic on port 443 and try and MITM the traffic to inject their own TLS certificate. Since our VPN runs on 443 and isn’t TLS based it could be that the firewall is blocking/breaking it in some way.
Without shell access to the device, it’s hard to make a definitive prediction, but out of interest what do you see if you goto https://vpn.balena-cloud.com from a machine on the same network?
This is what I get from a machine on the same network, the same for a computer on my home network:
This site can’t be reached
vpn.balena-cloud.com unexpectedly closed the connection.
Try:
Checking the connection
Checking the proxy and the firewall
Running Windows Network Diagnostics
ERR_CONNECTION_CLOSED
Also I did a quick test with port 443. I created a port forward on my home router from incoming 443 traffic to 3389 (rdp) internally. Then from the fire-walled network I was able to RDP to my home computer through port 443. So I’m now sure the specific traffic on 443 is not monitored by the firewall.
Hi George, there are no tools we are aware of that can properly verify OpenVPN connectivity without actually establishing a connection to a known working OpenVPN server on a public Internet with an OpenVPN client. We currently have tens of thousands of devices connected to our production VPN server, so we are confident it is working correctly.
Having said that, what is the output of journalctl -u openvpn from this device after you restart the service using systemctl restart openvpn?
Thanks for the troubleshooting info. Unfortunately it will have to wait until I get out to the sight again because I put them all on WiFi to get them up and running again. Are there any other things I can check in addition to this when I get to it?
You could tail the full system logs with journalctl -a -f and see if anything interesting pops up in there. Aside from that, please double check the date and time on the device and make sure it’s correct.
From the logs it’s appears the device is unable to connect on port 443 to one of our VPN endpoints (connection immediately reset).
For further troubleshooting, it may be best to download an unmanaged dev. variant from balenaos.io, deploy it on the network, SSH into it and run network tests. This should help to pin point the issue.
You could also use an unmanaged production variant, though you would need to inject your sshKeys before flashing in that case, otherwise you won’t be able to SSH in.