Heartbeat Only "The terminal is unavailable because your device is not connected to the VPN"

I have 4 device working fine using an open WiFi connection. When I try to move to a hardwired network connection (that’s behind our business firewall) the application runs fine but VPN is not available. I have configured custom NTP in config.json, I know that port is blocked. But I would assume the DNS is getting configured via DHCP. Maybe not? I have not tried configuring DNS manually yet. What’s my next step. I have lost remote access and I’m not at the devices location, so I cannot gain access.

Pi4
Host OS version [balenaOS 2.54.2+rev1]
production
Supervisor version 11.12.4
Current release e837802

Hi there, do you happen to have any VPN/online devices in the same L2 network segment as the inaccessible device(s)? We may be able to reach those through the working devices. If so, please grant support access to the devices/apps and let us know the IDs.

In any case, most VPN connectivity issues we see are due to ports/domains/IPs being blocked. We document our network requirements here.

It is also possible for VPN to spot functioning if the machine clock is too far in the past or future, since it will fail the certificate validation step in this case.

Unfortunately I cannot grant access to any devices on this network due to security concerns. I already reviewed the network requirements and I do believe the NTP, DNS and ports are blocked. Port 433 is open. That’s why I configured my own NTP servers in the config.json (but in hindsight now, I used DNS names not IP addresses so if my DNS is not working they won’t be discovered) I also assumed the DNS servers would get picked up from the DHCP server, so my questions is, is that NOT the case? If not, that would explain everything, I think.

I can go back to the device and adjust the config.json with the following and try again. I did not configure sshKeys in the config.json ahead of time, so I don’t think I can get SSH access to the devices, unless there is another way you can advise me on to access the device from the network to make these changes.

This is what I will do in config.json:

"ntpServers": "xxx.xxx.xxx.xxx xxx.xxx.xxx.xxx",
"dnsServers": "xxx.xxx.xxx.xxx xxx.xxx.xxx.xxx",
"os": {
    "sshKeys": [
        "ssh-rsa xxxxxx xxxx@xxxxxxxx"
        ]
    }

Thanks for the help! BTW, I just checked with the network guys and they say DNS ports are not blocked, so now I’m really confused.

Please check you have port 443/tcp (not 433) open. We use HTTPS port to run OpenVPN over for reasons of wider firewall compatibility, etc. Though this protocol doesn’t look like HTTPS on the wire, so if a packet inspection firewall is looking at the traffic, it could be dropping it.

In any case, without access to the device/network, we can only speculate what is going on with the VPN, but if the device is sending heartbeats to the API and you can see logs in the dashboard, that means the networking stack is up, DNS is working and HTTPS traffic (to the API) is getting through.

If the devices don’t have SSH keys section added to the config, the SSH daemon will only be started with the keys accessible via the proxy/VPN flow and if the VPN connection can’t be established for above reasons, then this is obviously not an option.

Your best course of action is to get hold of the device (media if SD card), mount it and update config.json in the boot partition with your SSH key(s); then ssh into the device and do some troubleshooting from the Host OS, like doing DNS lookups, check date/time/NTP sync, etc.

Sorry I meant port 443. Thanks for the help, I will visit the device and figure it out!

I started again from scratch. Modified my config.json with my NTP servers and SSH Keys. Brought up a new hardwired device and SSH’ed into it. This time the app comes up as before and works as it should, but the device shows “Inactive” and “connecting…” and still no VPN.

I verified it’s getting internal DNS servers from DHCP.
I see my NTP servers that I configured, but it also shows 16 other NTP server it of course cannot reach. Curious, why is it added all these other servers when I specifically specified my own?
I can ping all the domains in the network requirements (balena-cloud.com, docker.com, docker.io)
I can ping and reach public DNS servers, but if I put them in my config.json DNS settings they don’t seem to be configured, only my DHCP servered one.

I have attached a lengthy document with the results of some queries I performed via SSH, things I found in a document found here: https://www.balena.io/blog/top-6-tips-for-troubleshooting-your-host-os-with-resin-io/ I have hidden my domain name in the doc for privacy.

What can I try next? Thanks!

test_results.pdf (124.1 KB)

So you know, I have other Windows and Linux systems that are completely stand-alone (meaning not associated with the network’s domain controller) and that have no problem reaching https servers on the Internet via port 443. So I can confirm port 443 is pretty unrestricted by the firewall.

(that’s behind our business firewall)

I am wondering if the issue is possibly that your firewall is doing some DPI or SSL interference. We have seen cases of firewalls which will see traffic on port 443 and try and MITM the traffic to inject their own TLS certificate. Since our VPN runs on 443 and isn’t TLS based it could be that the firewall is blocking/breaking it in some way.

Without shell access to the device, it’s hard to make a definitive prediction, but out of interest what do you see if you goto https://vpn.balena-cloud.com from a machine on the same network?

This is what I get from a machine on the same network, the same for a computer on my home network:

This site can’t be reached
vpn.balena-cloud.com unexpectedly closed the connection.
Try:

Checking the connection
Checking the proxy and the firewall
Running Windows Network Diagnostics

ERR_CONNECTION_CLOSED

Also I did a quick test with port 443. I created a port forward on my home router from incoming 443 traffic to 3389 (rdp) internally. Then from the fire-walled network I was able to RDP to my home computer through port 443. So I’m now sure the specific traffic on 443 is not monitored by the firewall.

Hi George, there are no tools we are aware of that can properly verify OpenVPN connectivity without actually establishing a connection to a known working OpenVPN server on a public Internet with an OpenVPN client. We currently have tens of thousands of devices connected to our production VPN server, so we are confident it is working correctly.

Having said that, what is the output of journalctl -u openvpn from this device after you restart the service using systemctl restart openvpn?

Thanks for the troubleshooting info. Unfortunately it will have to wait until I get out to the sight again because I put them all on WiFi to get them up and running again. Are there any other things I can check in addition to this when I get to it?

Thanks - George

You could tail the full system logs with journalctl -a -f and see if anything interesting pops up in there. Aside from that, please double check the date and time on the device and make sure it’s correct.

Thanks. Yes, I forgot to mention in my other troubleshooting message, the time is correct.

I finally got back, here are the logs:

root@62dd0c0:~# journalctl -a -f
-- Logs begin at Thu 2020-10-22 18:23:29 UTC. --
Oct 22 18:50:02 62dd0c0 openvpn[6186]: Thu Oct 22 18:50:02 2020 Attempting to establish TCP connection with [AF_INET]35.169.89.252:443 [nonblock]
Oct 22 18:50:03 62dd0c0 openvpn[6186]: Thu Oct 22 18:50:03 2020 TCP connection established with [AF_INET]35.169.89.252:443
Oct 22 18:50:03 62dd0c0 openvpn[6186]: Thu Oct 22 18:50:03 2020 TCP_CLIENT link local: (not bound)
Oct 22 18:50:03 62dd0c0 openvpn[6186]: Thu Oct 22 18:50:03 2020 TCP_CLIENT link remote: [AF_INET]35.169.89.252:443
Oct 22 18:50:03 62dd0c0 openvpn[6186]: Thu Oct 22 18:50:03 2020 Connection reset, restarting [-1]
Oct 22 18:50:03 62dd0c0 openvpn[6186]: Thu Oct 22 18:50:03 2020 SIGUSR1[soft,connection-reset] received, process restarting
Oct 22 18:50:03 62dd0c0 openvpn[6186]: Thu Oct 22 18:50:03 2020 Restart pause, 40 second(s)
Oct 22 18:50:19 62dd0c0 balenad[1287]: time="2020-10-22T18:50:19.506498880Z" level=info msg="shim balena-engine-containerd-shim started" address=/containerd-shim/b248d7eb85789e593cf43511b40f5730aa1941

7197e17961c05595a10b138621.sock debug=false pid=6545
Oct 22 18:50:19 62dd0c0 balenad[1287]: time="2020-10-22T18:50:19.773107714Z" level=info msg="shim reaped" id=0dee77a9c9fcf1aece8fffed889f608524699c07be76bcea3b81345631c98246
Oct 22 18:50:19 62dd0c0 balenad[1287]: time="2020-10-22T18:50:19.782761503Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"

Oct 22 18:50:43 62dd0c0 openvpn[6186]: Thu Oct 22 18:50:43 2020 NOTE: the current --script-security setting may allow this configuration to call user-defined scripts
Oct 22 18:50:44 62dd0c0 openvpn[6186]: Thu Oct 22 18:50:44 2020 TCP/UDP: Preserving recently used remote address: [AF_INET]3.227.28.93:443
Oct 22 18:50:44 62dd0c0 openvpn[6186]: Thu Oct 22 18:50:44 2020 Socket Buffers: R=[131072->131072] S=[16384->16384]
Oct 22 18:50:44 62dd0c0 openvpn[6186]: Thu Oct 22 18:50:44 2020 Attempting to establish TCP connection with [AF_INET]3.227.28.93:443 [nonblock]
Oct 22 18:50:45 62dd0c0 openvpn[6186]: Thu Oct 22 18:50:45 2020 TCP connection established with [AF_INET]3.227.28.93:443
Oct 22 18:50:45 62dd0c0 openvpn[6186]: Thu Oct 22 18:50:45 2020 TCP_CLIENT link local: (not bound)
Oct 22 18:50:45 62dd0c0 openvpn[6186]: Thu Oct 22 18:50:45 2020 TCP_CLIENT link remote: [AF_INET]3.227.28.93:443
Oct 22 18:50:45 62dd0c0 openvpn[6186]: Thu Oct 22 18:50:45 2020 Connection reset, restarting [-1]
Oct 22 18:50:45 62dd0c0 openvpn[6186]: Thu Oct 22 18:50:45 2020 SIGUSR1[soft,connection-reset] received, process restarting
Oct 22 18:50:45 62dd0c0 openvpn[6186]: Thu Oct 22 18:50:45 2020 Restart pause, 80 second(s)

openvpn_journal.pdf (165.9 KB)

Hi George, what public IP(s) are these devices connecting to us from?

Try 208.79.244.0/22

We’ve had no contact with 208.79.244.0 - 208.79.247.255 today.

Okay thanks, when I can get a hold of out network guru, I’ll try to trace it from this end. Do the logs tell you anything at all?

From the logs it’s appears the device is unable to connect on port 443 to one of our VPN endpoints (connection immediately reset).

For further troubleshooting, it may be best to download an unmanaged dev. variant from balenaos.io, deploy it on the network, SSH into it and run network tests. This should help to pin point the issue.

You could also use an unmanaged production variant, though you would need to inject your sshKeys before flashing in that case, otherwise you won’t be able to SSH in.