Device is stuck in VPN only… I suspect it possible firewall rules or something like that? How do I troubleshoot that?
Hi, please make sure the network where the device is located complies with:
If everything looks fine, the easiest way to debug would be to ssh into the device and check the journalctl log for the supervisor service.
Let us know if that works
Note that BalenaOS uses port 443 both for the VPN but also to connect to API endpoints using TLS. Some firewalls dislike encapsulated HTTP traffic in this way, so that is something to check too.
Seems like it was proxy server. Ended up putting machine on another subnet where proxy was not required. Not sure how it connected to VPN, but journalctl did mention that it was unable to send hearbeat - connection timeout.
Thank you!
Good to know it works now, please let us know if you need further support.
Another device doing this… customer opened firewall however still VPN Only status:
See following in journalctl:
Oct 03 00:11:13 470020c os-config[611]: Awaiting service configuration…
Oct 03 00:11:17 470020c ddfc1ca1cd2f[620]: [event] Event: Device state report failure {“error”:{“message”:""}}
Oct 03 00:11:17 470020c resin-supervisor[1149]: [event] Event: Device state report failure {“error”:{“message”:""}}
Oct 03 00:12:30 470020c ddfc1ca1cd2f[620]: (node:1) UnhandledPromiseRejectionWarning: Error: Unhealthy
Oct 03 00:12:30 470020c ddfc1ca1cd2f[620]: at e. (/usr/src/app/dist/app.js:594:33274)
Oct 03 00:12:30 470020c ddfc1ca1cd2f[620]: at /usr/src/app/dist/app.js:594:32371
Oct 03 00:12:30 470020c ddfc1ca1cd2f[620]: at Object.next (/usr/src/app/dist/app.js:594:32476)
Oct 03 00:12:30 470020c ddfc1ca1cd2f[620]: at o (/usr/src/app/dist/app.js:594:31193)
Oct 03 00:12:30 470020c ddfc1ca1cd2f[620]: (node:1) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or b>
Oct 03 00:12:30 470020c resin-supervisor[1149]: (node:1) UnhandledPromiseRejectionWarning: Error: Unhealthy
Oct 03 00:12:30 470020c resin-supervisor[1149]: at e. (/usr/src/app/dist/app.js:594:33274)
Oct 03 00:12:30 470020c resin-supervisor[1149]: at /usr/src/app/dist/app.js:594:32371
Oct 03 00:12:30 470020c resin-supervisor[1149]: at Object.next (/usr/src/app/dist/app.js:594:32476)
Oct 03 00:12:30 470020c resin-supervisor[1149]: at o (/usr/src/app/dist/app.js:594:31193)
Oct 03 00:12:30 470020c resin-supervisor[1149]: (node:1) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block,>
Oct 03 00:13:00 470020c balenad[620]: time=“2020-10-03T00:13:00.566955207Z” level=warning msg=“Health check for container ddfc1ca1cd2f5e8bb7c9bb58b1f4629dd712b7c37ff27af37608bef4c06fa86a error: context deadline >
Oct 03 00:13:18 470020c ddfc1ca1cd2f[620]: [event] Event: Device state report failure {“error”:{“message”:”"}}
Oct 03 00:13:18 470020c resin-supervisor[1149]: [event] Event: Device state report failure {“error”:{“message”:""}}
Was asked for list of domains to whiteline and gave them domains from here: https://www.balena.io/docs/reference/OS/network/2.x/#network-requirements
Is there any other domains that should be whitelisted
These are the only domains we use, however they may resolve to multiple different IPs, which do change form time to time. So in-line proxies and firewalls need to respect DNS TTLs on all of the records and update their configurations accordingly.
Supposedly they opened all those domains. is there a test i could do… for example to heartbeat url or something like that, since i am able to VPN into the machine.
The best, but by no means exhaustive test is openssl s_client -connect api.balena-cloud.com:443
. If you get the proper SSL certificate after running this, you can be assured the traffic is flowing and there isn’t any DPI/MITM firewalls messing with the SSL traffic. You can do the same for a lot of the other endpoints, like registry2.balena-cloud.com
and registry-data.balena-cloud.com
. These are the main ones. VPN (vpn.balena-cloud.com
) is harder to test, since it’s OpenVPN on port 443. You could try to see if the connection is established, but aside from that, it’s basically down to looking at the status of the device on the dahsboard:
$ telnet vpn.balena-cloud.com 443
Trying 35.169.76.143...
Connected to vpn.balena-cloud.com.
Escape character is '^]'.
<ENTER_KEY>
Connection closed by foreign host.
Thank you Anton… this is was a really good tip. I did get not get certificate info back so something still is not quite right from the firewall perspective. Tried from another device and saw certificate info!