Thanks for your response. Here to provide an update on what we’ve learnt.
When SSH connecting into the hostOS, we found that the chronyd service (used for NTP synchronisation) was not running (found with systemctl status chronyd
and spotted with chronyc sources
reporting 503). We believed this may have been an issue with our configuration, or something going wrong in the boot cycle, so we first investigated to see if the chrony service was even trying to use the custom NTP servers set.
We checked the chrony configuration file (/etc/chrony.conf) but this was a red herring. In a separate Balena thread, an engineer explained that the change to chrony configuration happens at runtime, and is not applied to the chrony configuration file itself. This change is applied in the balena-ntp-config service that is run to apply the new custom NTP sources. Evidence in the journalctl logs showed that the script was run on boot.
When runs, after some initial checks, it calls on chronyc to update the NTP sources to use - but does so via a different script, chrony-helper. The issue this script has was that the chronyd service was not running, which is the same symptom we notice even after boot when we connect to investigate.
We found (through tangential research) that its commonly known that the chronyd service will stop automatically (or maybe not start) if there is already some sort of time service being run. The recommendation provided is to use ps aux
to see if there are any time-related services running.
When investigating running services, we found a service called /usr/bin/timesync-https.sh
running. Investigating the source code of this Balena script, along with commentary from the Balena forums, the expectation of timesync-https is to make a one-off time synchronisation over HTTPS (to make sure certificates can be used) and after that happens chronyd is launched.
At this point we knew that the timesync-https service tries to (indefinitely) contact the os.network.connectivity.uri (as specified in the balena config.json) to retrieve a current timestamp, before it hands over to chronyd to keep time synchronised over NTP.
The default URI is https://api.balena-cloud.com/connectivity-check and when manually trying to run a curl request on this, we receive the same SSL error we saw on other Balena endpoints.
curl: (35) error:0A000152:SSL routines::unsafe legacy renegotiation disabled
Our understanding of this is that the enterprise firewall we’re behind does not support RFC5746 secure negotiation. Further reading indicates this could be a well known problem on enterprise networks. So either the enterprise needs to fix it or a compromise on SSL strictness made.
Our solutions ahead were:
- Upgrade the intermediary to support secure SSL renegotiation. Since this is on a large enterprise network, and we’re a small-fish vendor, convincing them to get that work done is not a suitable course of action.
- Use an alternative connectivity URI that provides the same result as Balena’s own endpoint. Namely this is a 204 status code, an up-to-date timestamp in the response headers (and for extra measure, a header “X-NetworkManager-Status” with a value of “online”). This custom connectivity endpoint would then be set in the config.json file.
- Adjust the OpenSSL config of the OS to allow for legacy renegotiation. At present unsure how to do this, as trying to modify
/etc/ssl/openssl.cnf
to addOptions = UnsafeLegacyServerConnect
fails because the file system is read-only.