I have a similar issue. The is_online flag is not updating. New devices can join the fleet but the state is always offline (or online if the device was online prior to the upgrade mentioned below).
The device logs feed is working fine, and I can open tunnels, but can’t communicate with the devices.
Backstory:
I upgraded from 4.1.251 to 4.1.349 and created a new ssl certificate using certbot. When I do make up, docker ps often show that the vpn service as unhealthy.
The only time I can manage to get the vpn service to become healthy is when the api fail to run migration "0074 and I run the following:
And that’s when I encounter the issue of device state not being updated, e.g. is_online never updates. When I upgraded to 4.1.251, I only had to remove the 0074 migration file and restart the services to get up and running. I wonder if this is even related to my issue?
Maybe I need to dig into what’s happening in the vpn container?
@benjboy did you manage to figure out what caused the issue?
This scenario, where devices appear offline but logs are current and accessible, coupled with VPN service errors, strongly indicates a networking or VPN-specific connectivity issue, rather than a complete device failure. Here’s a breakdown of potential causes and troubleshooting steps:
Understanding the Symptoms:
Devices “Offline”:
This suggests a lack of real-time communication. Devices might not be responding to ping requests or other network probes.
This could mean a break in the live network connection.
Current, Accessible Logs:
This indicates that devices are still able to record and transmit data, even if live communication is impaired.
This suggests the underlying systems are functioning, but a specific communication pathway is blocked.
VPN Service Errors:
This is a crucial clue. VPNs are complex, and errors can disrupt network traffic in various ways.
Possible Causes:
VPN Tunnel Issues:
Tunnel Failure: The VPN tunnel itself might be unstable or failing to establish correctly. This can be due to configuration errors, network congestion, or firewall interference.
MTU Issues: Maximum Transmission Unit (MTU) mismatches can cause packet fragmentation, leading to connectivity problems. VPN encryption adds to packet size, exacerbating this issue.
Port Blocking: Firewalls or ISPs might be blocking the ports required for the VPN to function (e.g., UDP 500 and 4500 for IKE).
Certificate Problems: Expired or invalid VPN certificates can prevent successful authentication.
Routing Problems:
Incorrect routing tables can prevent devices from reaching each other, even if they are technically online.
Double NAT (Network Address Translation) can also cause routing conflicts.
Firewall Interference:
Firewall rules, either on the devices themselves or on network firewalls, might be blocking VPN traffic.
DNS Issues:
Although logs are still working, DNS problems can prevent devices from resolving hostnames, making them appear offline.
Network Congestion:
Heavy network traffic can overwhelm the VPN or network infrastructure, leading to dropped connections.
Troubleshooting Steps:
VPN Log Analysis:
Carefully examine the VPN error logs. Look for specific error codes or messages that indicate the nature of the problem.
Network Connectivity Tests:
Test connectivity from different points in the network, including within and outside the VPN.
Use tools like ping, traceroute, and telnet to identify where the connection is breaking down.
VPN Configuration Review:
Verify the VPN configuration settings, including tunnel type, encryption protocols, and authentication methods.
Ensure that all devices are using the same configuration.
Firewall Checks:
Review firewall rules on all devices and network firewalls to ensure that VPN traffic is allowed.
Temporarily disable firewalls to isolate potential conflicts.
MTU Adjustment:
If MTU issues are suspected, try reducing the MTU size on the VPN interface.
Port Verification:
Ensure that the necessary VPN ports are open on all firewalls and routers.
Certificate Validation:
Verify that all VPN certificates are valid and up to date.
Routing Table Inspection:
Inspect routing tables on all devices and routers to identify any routing errors.
DNS Troubleshooting:
Test DNS resolution from the devices.
ISP Involvement:
If problems persist, contact your ISP to rule out network issues on their end.
Key Considerations:
Security: Be cautious when disabling firewalls or making changes to VPN configurations.
Documentation: Keep detailed records of all troubleshooting steps and findings.
By systematically working through these steps, you should be able to identify and resolve the connectivity issues. For More Details