NTP Service for Open Customers

Recently our RPi4 devices running BalenaOS 2.105.21 stopped reporting the correct day/time. The online docs suggest that Balena hosts the service but I wonder if this is only for cloud customers. Can someone confirm where Open gets the correct time and why devices may be failing to get proper time?

Hey @brownster!

I think NTP is managed in the same way for both openBalena and balenaCloud, pointing to our own NTP servers (docs here). You might want to check that UDP port 123 is open to ensure it’s able to connect. But you can also choose to configure your own NTP servers in the config.json if you prefer (docs here).

Let me know if that’s helpful; if not, tell me more about the situation (are you seeing errors? is the time just off? what do you see coming from chronyd?) and I’ll reach out to our OS engineers to find out a bit more.

Hello @the-real-kenna,

We would like to take you up on your offer to check with the devs.

We have confirmed that the HostOS for OpenBalena is dependent on some NTP service that lives within our hosted OpenBalena services. There is no named service that makes it clear where such a service would live. (see pic below) We think it might be openbalena-haproxy since we’ve seen this service hang and then cause problems.

The problems are significant and would be considered an outage. If the Openbalena backend is struggling, the local time snaps to an old date and time that is way off killing https connections. Our assumption was that the backend is for deployments only and not really needed for operation. Not true unfortunately.

Questions:

  1. Is there a way to change the NTP servers on our current fleet?
  2. What other services are the devices dependent on to run properly? As we end-of-life this device we were hoping to bring the Balena backend down and allow customers to continue to run the last state at their own risk as long as they want.

Reminder that we standardized on BalenaOS 2.105.21

Hey @brownster,

The team shared some details with me, so I’ll avoid trying to rephrase and just give it to you verbatim. :slight_smile:

Question 1: Is there a way to change the NTP servers on our current fleet?

“openBalena” is not an NTP time source, nor a client. We can’t comment on whatever (if anything) is providing NTP to the network where it is running. Devices (clients of openBalena) running balenaOS will get their NTP configuration as per the docs we sent (unless they’ve changed that via custom config.json params).

Also, there isn’t a way to change the NTP remotely for the whole fleet; it’s the same as balenaCloud. You’ll need to do this on the config.json per device via hostOS terminal access and I think reboot the device.

Question 2: What other services are the devices dependent on to run properly? As we end-of-life this device we were hoping to bring the Balena backend down and allow customers to continue to run the last state at their own risk as long as they want.

I assume they are trying to understand if there are anything that an operational device needs from the backend to stay operational? The answer is no, they should be able to take the backend down and the devices will just continue to run as usual. However, they won’t have remote access, etc. so they might want to put sshkeys on the device for their users to locally ssh in

Hopefully that’s the insight you needed; let us know.

@the-real-kenna - This is not what we are seeing. We can reproduce this issue consistently. Keep in mind we are not talking about devices that exist behind one firewall or on one network with issues. Our devices are deployed in hundreds of very different customer environments, yet all behave the same way when the openBalena backend goes down. (Loses local time causing a cascade of other issues)

Although not easy to demo, we can demo this for you.

@brownster,

Can you share a bit more about what happens when you bring the backend down? What other services are running that they might be dependent on?

The screenshot you shared of your Kubernetes dashboard has some services that aren’t built by us, so I’m wondering if those are contributing to the issue.

@brownster one other piece of information that would be helpful (in addition to the OS version that you already provided) is which version of each of the components in the openbalena stack you are running (i.e. open-balena-api, open-balen-vpn, etc). We should then look at the release timing of that vs your host os version.

In our experience, the openbalena stack is compatible with balenaos versions at or prior to when it was released (because balena cloud needs to support legacy devices and not force upgrades of host os’s), but not the other way around - and new features / services could be (and regularly are) introduced in newer versions of balena os that require newer versions of openbalena. This isn’t a problem for balena cloud because balena regularly updates the versions of the stack they run in balena cloud, but openbalena customers need to be aware of this dynamic.

We have had to update our openbalena stack many times for this reason, and we lock the host os versions that devices are running to known compatible versions, so that we can coordinate updates of them with updates to openbalena.

Let me provide a little more info about what it seems like we are seeing. It seems that something is happening when our openbalena server is down that is preventing it from properly sync’ing with ntp servers. Not that the openbalena server itself is providing the ntp services.

Is there documentation somewhere about when/how the balena devices sync with ntp that might provide a bit more info about what could be happening?

1 Like

Hi @jwdev,

Thanks for the additional insight.

The NTP settings are configured on the devices themselves, as part of the config.json, so it’s not about an actual balena service, but something about communication to the devices that’s problematic. (it could be a balena service that’s the issue, but the point is that what’s breaking is the communication between the devices and their time server it seems).

I’m not sure if you’ve found this document on our site, it’s a bit buried, but it might help you get to the root of things: Time management - Balena Documentation.

Are you able to share more about how your configuration works? i.e. what is between these devices and their time server, what of your services that is being stopped could break communication between the two, etc.?

If it were me, I would probably do a network trace as well. See what the trace looks like when the time server is running properly, and see what might be failing or stoping when you bring down the openBalena stack.

If you’re able to answer @drcnyc question about which version of each component you are running, that would be helpful too. It may be that there’s something about certain versions that would give us a clue.

Looking forward to finding the answer to this one… it’s mysterious and I want to know, lol.