I was doing some poking about and I noted that the TTL on api.balena-cloud.com. is quite low for something that is requested so frequently (supervisor). DNS request traffic could be significantly lowered if the TTL was a couple factors higher. I know operationally it’s easy to work and plan around low TTLs however for low traffic fleets (such as those on 3/4G this could be an easy win for as much as a 10% improvement).
Hi, thanks for the feedback. We set the TTL to a low value in case we need to fail over to a different address. You have raised a valid point and I shall raise it to the team for discussion. We’ll try to find a good balance for it.
Hi, the supervisor polling interval has a minimum value of 10 minutes. The DNS address won’t be queried until the request is made so unless we set the TTL to maybe at least an hour, the DNS will still be queried for every supervisor poll. Setting the supervisor poll interval to 1 hour if you don’t expect frequent updates might be more helpful for such devices. You may refer to the docs on how to further reduce the bandwidth consumption for low traffic devices. The supervisor can also be restarted to immediately poll for updates in case an immediate change is needed.
The web dashboard also uses the same API so if we increased the TTL and we did a failover, web users might not get the new address as immediately. In this case, keeping the API address TTL to a very low value makes more sense.
I understand where you are coming from with the supervisor polling interval. While I understand your reasoning for lower TTLs (failover) consistent IPs maintained through ELB and similar are quite reliable.
Another option might be to use keepalive and keep a single connection open.
Anyway this was just my oppinion given what I saw on a device under investigation.
Perhaps that should be adjusted if it’s whats holding you back. An alternative subdomain wouldnt be a bad idea.
FYI there is also the connectivity healthcheck. Minor but is performing hourly lookups.
Have you considered setting a local resolver cache (we use dnsmasq) to override upstream TTLs instead? You could patch the host OS to add the relevant configuration properties to /etc/dnsmasq.d/my-ttl.conf
override, which will persist between device reboots. You’ll need to mount -o remount r,w /
first.