Extremely high network usage on startup

Hello,

We have been using balena for a while with our autonomous robot. Recently, we updated from ROS to ROS2. With it, we changed from one monolithic container to twenty smaller containers each containing one node.

Since this update, we sometimes see very large 4G network spikes on bootup. The usage is steadily at a level of around 1600 MB / hr. Obviously this will incur extremely high costs if unresolved. So far, the only way to stop the spikes seems to power off the device and power it on again. This indicates to me some sort of failed data transfer loop.

This morning, one of our devices experienced such a spike during power-up. There was no update to be downloaded for the device, so it was a regular boot. Noticing the spike, I managed to log some packets with tcpdump. It seems that the vast majority of packets is going from our balena device to some private IP:

SOURCE DESTINATION PROTOCOL LENGTH INFO
192.168.1.XXX 10.244.XXX.XXX IPv4 1516 Fragmented IP protocol (proto=UDP 17, off=2960, ID=0f99)
192.168.1.XXX 10.244.XXX.XXX RTPS 484 INFO_TS, DATA, Unknown[80]
192.168.1.XXX 10.244.XXX.XXX RTPS 168 INFO_DST, ACKNACK, Unknown[80]

These packets are sent at around 250 Hz, causing the huge spike in data. Is this IP related to a balena service? It looks like it at least goes outside of the internal network, otherwise it would not incur such high 4G network usage.

We have already contacted our SIM provider – even though nothing changed compared to before the network spikes – and they said they couldn’t find any anomalies.

Going on a slight tangent, we also noticed high network usage when updating devices. Updating only one container, which based on image layer sizes should use around ~30 MB, uses 600 MB of data. Is it possible the same issue is at play here?

Please give us some indication of how to resolve this issue, because it was not present when we were on a single-container deployment, and it seems to me that the (incomplete) data transfer is happening between our device and a balena server. We have also tried using the Private Chat Support button (since we pay for Fleets), but that just triggered an infinite loading icon.
Thanks in advance for your help.

All the best,
Peter

Hello @PeterG thanks for your message. First of all, i’m going to double-check why your support button doesn’t work properly. Could you please confirm what browser are you using?

Regarding the network usage, I will move this to the balena networking engineers to see how we can help you more.

Hi @mpous, thanks for your quick reply. I’m using Chrome (v114) on an Ubuntu laptop/desktop.

Also, if it helps, I could send the PCAP file with analyzed network traffic over to you privately.

1 Like

Hello @PeterG this is really odd. We haven’t seen anything similar and it could be really complex to analyze it without having context of your application.

Did you try to analyze the network traffic using Ethernet or WiFi instead of cellular connectivity?

On the other hand, is it possible that you try to move from ROS to ROS2 and test the networking with less containers to try to explore what element is generating the spikes?

Let us know how would you like to proceed here

After doing a lot of tests, I think I’ve figured out the issue. I’ll do some additional testing after the weekend to confirm, but it seems to not be related to balena at all, rather an issue/oversight from our side with FastDDS.

I’ll circle back sometime next week with a definitive answer.

1 Like

Thanks for the confirmation @PeterG let us know if you get more insights about your issues!

It’s looking like the issue came from having our device discover ROS2 nodes on other devices (e.g. monitoring nodes from one of our laptops) when they were on the same local network. When disconnecting these devices from our robot’s network, it seems that the DDS implementation was still reaching out to these nodes, but not finding them. This seems to cause a lot of one-way traffic from the robot.

If we find out this is in fact not the case, and the issue may be balena-related, I will open another topic. For now, I guess we can close this.

Thanks for your help.

Thanks for getting back to let us know @PeterG, much appreciated! Please do let us know if this ends up being balena related in any way.