NTP (chrony) not starting on newly provisioned device

Greetings,

I have encountered an issue regarding NTP time synchronization while provisioning a new device. I have been able to replicate the issue on 3 separate raspberry pi devices. I have read other posts on this forum, yet none seemed to address my issue.

The setup:

  • When provisioning the device, I have set-up the config.json file as the tutorial suggests, with "ntpServers":"<my ntp server IP>" value that contains my local NTP server IP.
  • Upon connecting and booting the device, the device doesn’t get provisioned, with logs stating “Error: certificate is not yet valid”.
  • I am able to ssh onto the device.
  • timedatectl shows that no NTP service is not available:
root@fd779a4:~# timedatectl
               Local time: Sun 2020-09-20 14:06:44 UTC
           Universal time: Sun 2020-09-20 14:06:44 UTC
                 RTC time: n/a
                Time zone: n/a (UTC, +0000)
System clock synchronized: no
              NTP service: n/a
          RTC in local TZ: no
  • The system clock is set to the creation of the image.
  • chronyc tracking shows 506 cannot talk to daemon.
  • When starting that newly provisioned device and connecting to it, the systemctl status chronyd command states that it is “inactive (dead)”.
â—Ź chronyd.service - NTP client/server
     Loaded: loaded (/lib/systemd/system/chronyd.service; enabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system/chronyd.service.d
             └─chronyd.conf
     Active: inactive (dead)
       Docs: man:chronyd(8)
             man:chrony.conf(5)
  • The device then remains unsynchronized forever, not fetching the configuration due to certificates being invalid.
  • This issue persists even when removing the “ntpServers” value from config.json file, leaving the ntp sources to default while provisioning.

The steps i’ve taken as workarounds:

  1. Check chronyd logs with journalctl:
root@fd779a4:~# journalctl -u chronyd             
-- Logs begin at Sun 2020-09-20 10:51:53 UTC, end at Sun 2020-09-20 14:12:16 UTC. --
-- No entries --
  1. Attempt to start chronyd. service:
    systemctl start chronyd command hangs indefinitely, chronyd service never starts. No logs are being output.

  2. Manually forcing the time update with my server

I have manually forced the system time update with chronyd -q

root@eb468d1:~# chronyd -q 'server <my time-server IP> iburst'                                
2022-04-06T07:11:52Z chronyd version 4.0 starting (+CMDMON +NTP +REFCLOCK +RTC -PRIVDROP -SCFILTER -SIGND +ASYNCDNS -NTS -SECHASH +IPV6 -DEBUG)
2022-04-06T07:11:56Z System clock wrong by 87905.389257 seconds (step)
2022-04-07T07:37:02Z chronyd exiting

This command works and time is stepped correctly (I would assume this also verifies that my NTP server is able to provide time synchronization). Afterwards, the device fetches the correct configuration and starts services as it should. However, upon losing power and reconnecting, the system clock is delayed for the amount of time the device remains powered off.

It is also quite tedious for provisioning, having to ssh onto each device separately and force time stepping in order to get the correct initial configuration.

The BalenaOS version i’m using is balenaOS 2.88.5+rev1

Any insight would be greatly appreciated. Thank you in advance!

1 Like

Hi, thanks for your message.

Is your device connected to the network from boot, or does it take some time to connect (for example a slow to setup cellular connection or a captive portal)?

What I would expect to see is that the timesync-https service makes a one-off time synchronization to make sure certificates can be used, and after that happens chronyd is launched.

Could you please post the output of journalctl --no-pager -u timesync-https.service?

Also, could you please make sure that you can access the balenaCloud API with:

curl https://api.balena-cloud.com/ping

Just, just following up on this. Is this still a problem? If so, could you please answer the questions above?

@alexgg I am seeing pretty much the exact same symptoms as the OP.

In my situation, I’m trying to get the device functioning within a strict enterprise network. They’ve asked that we use custom NTP servers. I have been able to synchronise the clock manually (to their NTP servers) however it doesn’t persist past a power cycle.

Additionally, we get this issue trying to send a request to the api subdomain

root@dc4c524:~# curl https://api.balena-cloud.com/ping
curl: (35) error:0A000152:SSL routines::unsafe legacy renegotiation disabled
verbose logs
root@dc4c524:~# curl https://api.balena-cloud.com/ping --verbose
* STATE: INIT => CONNECT handle 0x557eb1d1b0; line 1834 (connection #-5000)
* Added connection 0. The cache now contains 1 members
* STATE: CONNECT => RESOLVING handle 0x557eb1d1b0; line 1880 (connection #0)
* family0 == v4, family1 == v6
*   Trying 104.18.12.102:443...
* STATE: RESOLVING => CONNECTING handle 0x557eb1d1b0; line 1964 (connection #0)
* Connected to api.balena-cloud.com (104.18.12.102) port 443 (#0)
* STATE: CONNECTING => PROTOCONNECT handle 0x557eb1d1b0; line 2027 (connection #0)
* ALPN, offering http/1.1
*  CAfile: /etc/ssl/certs/ca-certificates.crt
*  CApath: none
* Didn't find Session ID in cache for host HTTPS://api.balena-cloud.com:443
* TLSv1.0 (OUT), TLS header, Certificate Status (22):
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* STATE: PROTOCONNECT => PROTOCONNECTING handle 0x557eb1d1b0; line 2047 (connection #0)
* TLSv1.2 (IN), TLS header, Certificate Status (22):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (OUT), TLS header, Unknown (21):
* TLSv1.2 (OUT), TLS alert, handshake failure (552):
* error:0A000152:SSL routines::unsafe legacy renegotiation disabled
* multi_done: status: 35 prem: 1 done: 0
* The cache now contains 0 members
* Closing connection 0
* Expire cleared (transfer 0x557eb1d1b0)
curl: (35) error:0A000152:SSL routines::unsafe legacy renegotiation disabled
verbose output, with the CA cert specified manually
root@dc4c524:~# curl https://api.balena-cloud.com/ping --verbose --cacert /etc/ssl/certs/balenaRootCA.pem
* STATE: INIT => CONNECT handle 0x55af9831b0; line 1834 (connection #-5000)
* Added connection 0. The cache now contains 1 members
* STATE: CONNECT => RESOLVING handle 0x55af9831b0; line 1880 (connection #0)
* family0 == v4, family1 == v6
*   Trying 104.18.12.102:443...
* STATE: RESOLVING => CONNECTING handle 0x55af9831b0; line 1964 (connection #0)
* Connected to api.balena-cloud.com (104.18.12.102) port 443 (#0)
* STATE: CONNECTING => PROTOCONNECT handle 0x55af9831b0; line 2027 (connection #0)
* ALPN, offering http/1.1
*  CAfile: /etc/ssl/certs/balenaRootCA.pem
*  CApath: none
* Didn't find Session ID in cache for host HTTPS://api.balena-cloud.com:443
* TLSv1.0 (OUT), TLS header, Certificate Status (22):
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* STATE: PROTOCONNECT => PROTOCONNECTING handle 0x55af9831b0; line 2047 (connection #0)
* TLSv1.2 (IN), TLS header, Certificate Status (22):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (OUT), TLS header, Unknown (21):
* TLSv1.2 (OUT), TLS alert, handshake failure (552):
* error:0A000152:SSL routines::unsafe legacy renegotiation disabled
* multi_done: status: 35 prem: 1 done: 0
* The cache now contains 0 members
* Closing connection 0
* Expire cleared (transfer 0x55af9831b0)
curl: (35) error:0A000152:SSL routines::unsafe legacy renegotiation disabled
this only occurs when specifying the https protocol, as the following works - however I think that because it is http by default redirecting to https!
root@dc4c524:~# curl api.balena-cloud.com/ping
<html>
<head><title>301 Moved Permanently</title></head>
<body>
<center><h1>301 Moved Permanently</h1></center>
</body>
</html>

(also, based on your request to the OP…)

root@dc4c524:~# journalctl --no-pager -u timesync-https.service
-- No entries --

Please could you elaborate on why NTP synchronisation may be related to the api subdomain? Thanks!

Note we did need to install a custom root CA cert to the device. We did this and this got most of the remaining services working. Except the requests to api subdomain, and the ntp sync.

If NTP (chrony) is not starting on a newly provisioned device, there are a few troubleshooting steps you can try:

  1. Check the configuration file: Verify that the configuration file for chrony is properly set up. The configuration file is typically located at /etc/chrony.conf. Ensure that the configuration options are correctly specified, including the server addresses and any necessary authentication settings.
  2. Check for errors in the configuration: Run a syntax check on the chrony configuration file to ensure there are no syntax errors or typos. You can use the following command:
chronyc -f /etc/chrony.conf

If there are any errors, correct them and try starting chrony again.

  1. Check system logs: Check the system logs for any error messages related to chrony. On most Linux distributions, you can check the logs in the /var/log/ directory, such as /var/log/syslog or /var/log/messages. Look for any error messages or indications of why chrony failed to start.
  2. Verify network connectivity: Ensure that the device has network connectivity and can reach the NTP servers specified in the chrony configuration. Test the network connection using tools like ping or traceroute to confirm if there are any network issues preventing chrony from starting.
  3. Check for conflicting NTP services: Make sure that there are no other NTP services running on the device that could conflict with chrony. Use the following command to check for running NTP services:
ps aux | grep ntp

If another NTP service is running, consider stopping or disabling it and then try starting chrony again.

  1. Restart the chrony service: Attempt to restart the chrony service using the appropriate command for your distribution. For example:
sudo service chrony restart

or

sudo systemctl restart chrony

If none of these steps resolve the issue, it may be helpful to provide more specific details about the error messages or any relevant logs for further troubleshooting. Additionally, consult the documentation or support resources specific to your operating system and version for more guidance on resolving NTP startup issues.

Thanks for your response. Here to provide an update on what we’ve learnt.

When SSH connecting into the hostOS, we found that the chronyd service (used for NTP synchronisation) was not running (found with systemctl status chronyd and spotted with chronyc sources reporting 503). We believed this may have been an issue with our configuration, or something going wrong in the boot cycle, so we first investigated to see if the chrony service was even trying to use the custom NTP servers set.

We checked the chrony configuration file (/etc/chrony.conf) but this was a red herring. In a separate Balena thread, an engineer explained that the change to chrony configuration happens at runtime, and is not applied to the chrony configuration file itself. This change is applied in the balena-ntp-config service that is run to apply the new custom NTP sources. Evidence in the journalctl logs showed that the script was run on boot.

When runs, after some initial checks, it calls on chronyc to update the NTP sources to use - but does so via a different script, chrony-helper. The issue this script has was that the chronyd service was not running, which is the same symptom we notice even after boot when we connect to investigate.

We found (through tangential research) that its commonly known that the chronyd service will stop automatically (or maybe not start) if there is already some sort of time service being run. The recommendation provided is to use ps aux to see if there are any time-related services running.

When investigating running services, we found a service called /usr/bin/timesync-https.sh running. Investigating the source code of this Balena script, along with commentary from the Balena forums, the expectation of timesync-https is to make a one-off time synchronisation over HTTPS (to make sure certificates can be used) and after that happens chronyd is launched.

At this point we knew that the timesync-https service tries to (indefinitely) contact the os.network.connectivity.uri (as specified in the balena config.json) to retrieve a current timestamp, before it hands over to chronyd to keep time synchronised over NTP.

The default URI is https://api.balena-cloud.com/connectivity-check and when manually trying to run a curl request on this, we receive the same SSL error we saw on other Balena endpoints.

curl: (35) error:0A000152:SSL routines::unsafe legacy renegotiation disabled

Our understanding of this is that the enterprise firewall we’re behind does not support RFC5746 secure negotiation. Further reading indicates this could be a well known problem on enterprise networks. So either the enterprise needs to fix it or a compromise on SSL strictness made.

Our solutions ahead were:

  1. Upgrade the intermediary to support secure SSL renegotiation. Since this is on a large enterprise network, and we’re a small-fish vendor, convincing them to get that work done is not a suitable course of action.
  2. Use an alternative connectivity URI that provides the same result as Balena’s own endpoint. Namely this is a 204 status code, an up-to-date timestamp in the response headers (and for extra measure, a header “X-NetworkManager-Status” with a value of “online”). This custom connectivity endpoint would then be set in the config.json file.
  3. Adjust the OpenSSL config of the OS to allow for legacy renegotiation. At present unsure how to do this, as trying to modify /etc/ssl/openssl.cnf to add Options = UnsafeLegacyServerConnect fails because the file system is read-only.
2 Likes