Balena data usage higher than expected

Hi,

We have customers that are concerned about the high data usage of our system and after taking a closer look we found there is a discrepancy between the actual data usage and what is expected according to this doc: Reduce bandwidth usage - Balena Documentation.

In our systems normal state with the default Balena settings our data usage is at about 4468 MB/Month

This data usage goes down to about 3811 MB/Month with Balena logging disabled

When stopping our service entirely and keeping Balena logs disabled the data usage is still 2142 MB/Month

Following the information here: Reduce bandwidth usage - Balena Documentation and setting these values:
BALENA_SUPERVISOR_CONNECTIVITY_CHECK = 0
BALENA_SUPERVISOR_LOG_CONTROL = 0
BALENA_SUPERVISOR_VPN_CONTROL = 0
BALENA_SUPERVISOR_POLL_INTERVAL = 86400000

The data usage gets down to ~1499 MB/Month

Now if we were to set BALENA_SUPERVISOR_HARDWARE_METRICS = 0 (we are running v12.7.0 so we didn’t test this value) according to the doc this should save us at most 168 MB/Month which would bring out data usage down to 1,331 MB/Month

According to the doc setting these supervisor values should bring the data usage down to 1.3 MB/Month.

What else could be causing the system to be using ~1499 MB/Month?

Thanks,
Sophia

Thanks @sophiahaoui for reaching out. To understand your query better, can I ask you couple of clarifying questions:

  • Are the five config variables (BALENA_SUPERVISOR_CONNECTIVITY_CHECK, BALENA_SUPERVISOR_LOG_CONTROL, BALENA_SUPERVISOR_VPN_CONTROL, BALENA_SUPERVISOR_POLL_INTERVAL) what you have set and experiencing the ~1.499MB/moth usage or is it ~1499MB/month?
  • What is the OS version you have running along with v12.7 of supervisor? Also, any reason for not upgrading them to the latest version?
  • Can you check on the services running on the device other than balena ones, maybe some metrics still being batched out which are not related to balena?

I am also checking with our team to see if we have any other metrics/logging capture enabled other than ones highlighted with balena-supervisor.

Regards,
N

Hi,

Thanks for your response, here are answers:

  1. With the these settings of supervisor variables:
    BALENA_SUPERVISOR_CONNECTIVITY_CHECK = 0
    BALENA_SUPERVISOR_LOG_CONTROL = 0
    BALENA_SUPERVISOR_VPN_CONTROL = 0
    BALENA_SUPERVISOR_POLL_INTERVAL = 86400000
    We are experiencing 1499 MB/month, not 1.499

  2. The Host OS version is balenaOS 2.80.3+rev1

  3. We made sure to stop the services running on the device, so only balena is running.

Thanks,
Sophia

Hi,

I’m just checking in to see if there are any updates on this issue.

Thanks,
-Sophia

Hey Sophia,
That (1499MB) is an unusually high number. Here are a few questions that may help us narrow down. You may not have answers to all of them right away, but please help with whatever information you have (or can confidently guess).

  1. Do you see similar amount of usage on multiple/all your balena devices (assuming you have more than one!)?
  2. Can you share how you measured the traffic and where on the network was it done?
  3. Can you share what fraction of the total usage is upload (from the device) and what fraction is download (to the device)?
  4. Do you have any data as to what endpoint(s) all that traffic was being sent to?
  5. Is the traffic sent in short large bursts or trickles continuously all day?

Thanks and regards,
Pranav

Hi Pranav,

Here are some answers:

  1. Yes we are seeing a similar amount of usage on all of our devices.
  2. For measuring when the SUPERVISOR config variables were disabled, we would let the device run without any connection and then after a few days we would enabled VPN and check the eth0 data through the Host OS.
  3. On average 2/3s of the data was RX bytes and the rest was TX bytes.
  4. No, unfortunately we could not see where the data was coming from or going to.
  5. From some tests we ran while the VPN was enabled it appeared to be more trickling continuously all day, but once the VPN was disabled we could not check the data regularly.

Thanks for getting back to me and let me know if you have any other questions or specific test you’d like us to run from our end.

Sophia

Hi Sophia,

We recently did some data usage tests on balenaOS 2.80.3+rev1 and also found much higher bandwidth usage than expected, with approximately 2/3s RX bytes.

We upgraded the supervisor from 12.7.0 to 12.10.1 (kept the balenaOS version the same) so that we could disable the metrics reporting (I think from v12.8.x supports this) and found our usage dropped significantly. I think we saved about 32MB per day…(!)

Regards,
Sam

Hi Sam,

Good to know, I’ll give that a try now and see.

Thank you,
Sophia

Hi,

So quick update, after testing with supervisor version 12.8.0 and setting:
BALENA_SUPERVISOR_CONNECTIVITY_CHECK = 0
BALENA_SUPERVISOR_LOG_CONTROL = 0
BALENA_SUPERVISOR_POLL_INTERVAL = 86400000
BALENA_SUPERVISOR_VPN_CONTROL = 0
BALENA_SUPERVISOR_HARDWARE_METRICS = 0
As well as disabling our device’s services so it is only Balena running at the moment.

We were able to get out data usage down to about 1000MB per month. So that did take out a good amount of data usage however that’s still not the 1.3MB that is expected once all the metrics are disabled.

Currently I am testing on balenaOS 2.80.3+rev1 and Supervisor 12.8.0
Let me know if there are any other device configurations I should test out.

Thanks,
Sophia

Hi @sophiahaoui ,

We only tested with the v12.10.1 supervisor, not the v12.8.0, but I did spot on the changelogs for Balena Supervisor that in v12.8.3 they fixed a bug that prevented a recursive loop when reporting current state (balena-supervisor/CHANGELOG.md at master · balena-os/balena-supervisor · GitHub) so it might be worth trying a newer supervisor…

We ended up consuming 4MB per day (~120MB per month) with the following settings (note the different poll interval):

BALENA_SUPERVISOR_CONNECTIVITY_CHECK = 0
BALENA_SUPERVISOR_LOG_CONTROL = 0
BALENA_SUPERVISOR_POLL_INTERVAL = 900000
BALENA_SUPERVISOR_VPN_CONTROL = 0
BALENA_SUPERVISOR_HARDWARE_METRICS = 0

Regards,
Sam

Hi Pranav,

I was wondering if there’s any update on this?
Also Is there a way to simply lower the metrics reporting rate without removing it entirely?

Thanks,
-Sophia

Hi @sophiahaoui,

You mentioned the need for a “low bandwidth” mode. This is something we are currently discussing internally, though it may be some time before this is released. We’ll keep you in the loop though!

Are you now testing with Supervisor v12.8.3+? As @st-mono mentions, there are some current state reporting improvements which may reduce your data usage. Thanks, let us know!

Regards,
Christina

To check, you mentioned 1.3Mb which I assume is derived from this link: Reduce bandwidth usage - Balena Documentation

Could you explain how you’re calculating 1.3Mb from that docs page?

Hi Christina,

Okay great, I’d love to be kept in the loop for that.

I am testing with Supervisor 12.8.3 currently yes.

As for the expected 1.3MB per month, I got this value from that link at the very bottom: Reduce bandwidth usage - Balena Documentation

The following settings lead to data usage of approximately 1.3MB per month:

  • Disable BALENA_SUPERVISOR_VPN_CONTROL
  • Disable BALENA_SUPERVISOR_CONNECTIVITY_CHECK
  • Change BALENA_SUPERVISOR_POLL_INTERVAL to 24 hours (86400000 ms)
  • Disable BALENA_SUPERVISOR_LOG_CONTROL
  • Disable BALENA_SUPERVISOR_HARDWARE_METRICS

Thanks,
Sophia

Hi @sophiahaoui,

Ah, I see. With the newer Supervisors we haven’t been consistent in testing bandwidth consumption, unfortunately. From this thread, it’s looking more and more like the case where this 1.3Mb value is outdated. I created an issue for this here: Bandwidth reduction's example may report outdated metrics. · Issue #2126 · balena-io/docs · GitHub

Also, I believe we have a feature for bandwidth reporting on-device here if you’d like to follow this GitHub issue: Include IP table bandwidth usage in device metrics · Issue #1724 · balena-os/balena-supervisor · GitHub. However, this would require HARDWARE_METRICS to be enabled which may be undesirable for you.

We have a lot on our plate though – it may be useful to check the Supervisor repo’s meta manager to see what we’re prioritizing at the moment :slight_smile: (and feel free to ping us in GitHub issues too!)

Thanks,
Christina

@cywang117,
We ran into this back in August, and were told one of your action items was to introduce automated bandwidth testing. Has this not been implemented yet?

Hi,

I am one of balenaOS maintainers and want to shed some light on the OS bandwidth consumption.

Let’s start by saying that the numbers that appear in Reduce bandwidth usage - Balena Documentation are so outdated that it makes little sense to use them for anything else than setting an improvement objective. In hindsight, providing number for something that is changing per release was a mistake.

According to Measure bandwidth consumption on every release · Issue #1756 · balena-os/meta-balena · GitHub the last time we manually measured bandwidth consumption the results were:

BalenaFin @ 2.38.0+rev1
RX: 0.25 MB/h, 178.97 MB/month
TX: 0.08 MB/h, 57.62 MB/month

Those numbers are below what you are reporting, but also outdated.

We discussed this internally and decided we needed to introduce bandwidth usage as an OS constraint, and fail validation if the consumption is above a given threshold.

So, automatic bandwidth testing per release is still the path forward, but we haven’t yet done so. It’s part of a testing improvement the OS team is working on, but that specific item has not yet been worked on. The best place to look for progress is still the github issue in Measure bandwidth consumption on every release · Issue #1756 · balena-os/meta-balena · GitHub.

Once that is in place, we will still need to look into reducing the actual bandwidth used to a minimum.

Hi,

I would like to know where this issue stands.
I don’t see any updates on Balena data usage higher than expected - #23 by cywang117.

We are bringing in a lot of new units to our Balena fleets soon but this unexplained high data usage is becoming more of a concern and will not be a realistic solution for many of these sites.
Can we get some more insight on what is causing this and any ways we can lower the bandwidth.

I just tested overnight again with these config params
“RESIN_SUPERVISOR_LOG_CONTROL”: “false”,
“BALENA_SUPERVISOR_HARDWARE_METRICS”: “false”,
“RESIN_SUPERVISOR_CONNECTIVITY_CHECK”: “false”,
“RESIN_SUPERVISOR_POLL_INTERVAL”: “18000000”,
“RESIN_SUPERVISOR_VPN_CONTROL”: “false”

I have put our container in IDLE so it is not running anything

Checked data usage this morning:
uptime: 15:15
eth0 data usage: RX bytes:17094120 (16.3 MiB) TX bytes:5748947 (5.4 MiB)

I have granted this device support access: balena dashboard
Current config on unit has VPN enabled and RESIN_SUPERVISOR_POLL_INTERVAL back to default and feel free to reset any of these params to test with the device.

Thank you,
Sophia

Hi again Sophie, looking into the bandwidth constraint for the OS is still not being actively worked on. It’s a matter of juggling different priorities. I will ask our customer success team to contact you and study the business case to see where it stands with regards to other priorities.

Hi Sophia,

I believe that this conversation has shifted private chat support. So I will close this ticket. But if there is still something that you would like to discuss here in the forums, please feel free to reply and the ticket will automatically re-open.