Is there a Maximum Number Allowable Containers?

Hi Hugh,

Thanks for the response. While experimenting with enabling/disabling containers we’ve found that we can either run the 2 high CPU intensive containers OR run the 4 individual low CPU intensive containers but not both. It seemed like it could be either CPU or # Containers but your answer is re-assuring that it is more likely a CPU issue.

We’re running on the BalenaFin with the CM3+L. We’re using while true loops to monitor i2c traffic and update device state on the CPU intensive containers. The python scripts themselves are not significantly demanding but the loop seems to take up the available CPU capacity (at least on that core) running at 400khz on the i2c bus.

I’m curious if you have any insight on how CPU load impacts Supervisor, Docker Build and Live Push reliability… Should we be trying to load balance across more cores actively using CPUSET etc via Docker.Yaml? That’s something we’re thinking of trying next. Just not sure how to give Balena OS more room to breath. Should we be trying to keep individual core utilization under some threshold like 50%? Is there a way to dedicate the Supervisor to it’s own core? Am I even correct in assuming we can utilize all 4 cores :smiley:

We’re working on a robotics application so I anticipated diving into the deeper realms with Belana OS and the Fin but I assumed Docker would be doing core load balancing in the background.

I also agree with Hugh in that there are no container limits to follow, only those created by what the hardware resources can support. If the device is running low on CPU I imagine device API calls response time will be impacted, even cause timeouts, but more significantly local mode builds will be slowed. Live Push is intelligent enough to only execute commands that achieve your newest change and they are done inside the container. This means no image is being built so shouldn’t need a lot of CPU but if your change is to download and compile a new dependency then that could be affected.

It definitely sounds like you want to get the most out of the hardware you have which is awesome. I think we should try experimenting with the various CFS scheduler options from https://docs.docker.com/config/containers/resource_constraints/#configure-the-default-cfs-scheduler. We could run high CPU containers in their own cores via cpuset-cpus and limit their usage via cpus. This could allow other processes to not become blocked.

However, I believe this only works if live push, local mode builds, the OS, and supervisor all don’t run on the core we designated as the high load core. Try seeing where these services are running and if we can achieve the above core separation. I’ll look into this myself and see if this is a good idea / possible!

Thanks for the response. We’re definitely thinking along the same lines and exploring the CFS options to move containers around to redistribute the load. We’re running into two issues which you allude to, 1) Can we define what core Balena services run on so they do not conflict with high cpu containers and 2) Is there a way to identify what cores containers are running on in general?

I reached out to the OS team for their input on this strategy and I’ll let you know what I find. Currently, I don’t know of a way to set docker configs for the Supervisor container. This would allow us to make sure it runs on a specific core and never gets blocked. Same goes for the OS. I set a reminder for myself to get back to you within 24 hours :slight_smile:

Hey Alexander, I have confirmed there is no way to set CPU affinity for the OS which would allow us to make the OS run on a specific core. It seems the best approach is just managing how much CPU the containers are using since they are blocking other processing. Another user recently reported a very similar scenario so you might be able to use some of the tricks they have tried. Balena-engine crashing during livepush

Great information, thank you very much. Core affinity would an interesting feature for high cpu applications like ours. Ebradbury is on our team and was looking at this from another angle and yes it seems like we’ve resolved the issue by reducing CPU usage in our looping and reducing logging output.

Balena-engine crashing during livepush

Awesome. This is a really cool application that you are working on and I’ll continue to think of and advocate for some more tooling that would help people to manage their high workload containers. Thanks for sharing your experience because I’m sure someone else will stumble on this.

Hi @alexanderkjones,

I was wondering what the realistic update rate of your sensors is - on Augie, my hexapod project I get updates from the IMU at 20Hz, but I’ve had it as high as 100Hz and still not had an issue because then 100Hz still involves sleeping a lot of the time. I have a few suggestions;

  1. Figure out what refresh frequency you actually need from your sensors and add corresponding sleeps to your loops. Even a teeny tiny sleep will give your machine much more breathing room.
  2. Double check the code that’s polling the sensors for you. It’s possible it’s retrieving more registers than it actually needs which will significantly increase the number of transactions over the i2c bus.
  3. You said you have two separate processes in tight loops polling these sensors? Only one process can access the bus at a time, so there would always be one process in IOWAIT while the other is blocking, however this is likely on a per-transaction basis (see 2). I would suggest combining both into a single process so that you can decide what order transactions happen on the i2c bus, rather than the OS.
  4. Logging is expensive. Kill it as much as possible.
  5. If all else fails, the Fin Coprocessor a Silicon Labs BGM111 is actually connected to the i2c bus too, so you could move your i2c code there and just stream the data in via the UART (I do this with Augie using a Teensy). You’ll get a much better timing guarantee and reading a stream from the UART is much less work for your Python code. See the balena-fin-coprocessor-firmata project for an example how to build and deploy a coprocessor firmware.

Hope that helps!
James.

Oh, and one more thing. Make sure you’re setting the i2c bus speed to the maximum 400kHz provided all your devices support it by setting BALENA_HOST_CONFIG_i2c_baudate=400000 in the application configuration variables. The default is 100kHz, so you’ll be able to access the devices much faster.

@jimsynz thanks for the tips. For anyone following this thread in the future… We’re using 400khz on the i2C and more and more it’s looking liking logging is the bottleneck when running more containers. We’re operating at 20Hz polling the i2C bus and attached sensors.

Are there performance resources we could look at describing the effects of logging @jimsynz or is your comment anecdotal :slight_smile:

Hi

  • Like jimsynz said, I think you should definitely look into using the coprocessor for doing the communication bit over i2c - that would free up some of your load on the compute module. Given that you are using the balenaFin, the onboard coprocessor is a very handy feature for situations like these.
  • We don’t really have a document for the impact of logging, but I think you could very well do a quick experiment on an otherwise empty device to get a quick idea about how much load it usually contributes. I believe what Jim was saying was more anecdotal :slight_smile:

OK, we’ve found the issue which raises more questions! It appears that http requests between containers are impacting significantly on the CPU usage. We have 1 container running a loop requesting i2C data on a 400khz bus and that is working fine. We then have that send an http request to a second container (simple Flask server) to update a state object with 4 variables.

We are seeing 30% CPU usage for 15 http requests per second between two containers. When we reduce the number of requests between containers to 1 per second we see a CPU drop to 3%.

This seems to be the cause of our system instabilities and was surprising to us. Also, http requests between containers take approx 3ms. We had tested previously with MQTT and found the same results of 3ms per transaction on BalenaFin.

Are these performance results normal? Did not expect messaging between containers to be so expensive.

Hi Alexander, I can’t say I have thoroughly tested inter-container communication performance to provide any highly valuable context, but, if you are seeing roughly 3ms using HTTP as well as with MQTT, then that seems to be the bottleneck…No matter the protocol, 3ms, so I’d suspect thats the best the RaspPi Compute Module’s processor can do.

It might also be worthwhile to test the same containers / code on another device type if you happen to have anything else? A Jetson Nano, Beaglebone, or other similar board might make for an interesting comparison.

@dtischler if I get the chance I will test on another device. I’m curious though specifically about the CPU usage? IE that http requests on the BalenaFin would take so much CPU since multi-container is a preferred modality - and I’m assuming, that http is the best way to talk between services.

I dont know that we would state there is a “preferred” methodology for the balenaFin, as this is more of a software architecture issue. Your intended goals and outcomes drive the design of the software stack. I think that’s why I am curious about trying a different device for comparison purposes. I suspect it’s not a balenaFin issue per se, but more of a “this is what to expect from a low power IoT device based on Arm cores”, versus a desktop class processor. Hope that makes sense!

@dtischler et al., thanks for the responses. After more testing I’ve found that the issue may be using the alpine distro balenalib/raspberrypi3-alpine-python vs balenalib/fincm3-python. Performance seems to improve with live push when using the fincm3 image for containers but I will continue to verify. I was able to run 10 flask containers operating at 3 http requests per second each to the same client container at 60% CPU usage, with logging, without issue, which was a big improvement.

Most importantly, live push was much more responsive, even when running all these requests and logging. This leads me to think our use of our alpine distro may have been causing some of the live push build fragility. I tried to rebuild with Alpine and the build would not complete. Shooting in the dark here, really just the results of my heuristic tests.

Also, I believe we’re running into the limitations of http requests. They are running at 30ms not 3ms, which from my research is actually typical. Not the most performant option for inter-container communication for a robotic system and we’re researching moving to Redis which from the official site benchmarks should be closer to 3ms which gives a lot more room in a 20hz cycle.

In general the BalenaFin and Balena OS are a pleasure to work with, really just figuring out the best setup for us in this application. Any insight on the Alpine vs FinCM3 topic would be helpful, and will post final results when we find a stable scenario for our control system.

Interesting. My best guess would be that alpine uses musl-libc while the standard base-images, which are based on debian, have the glibc. Although I find it strange that you would get a noticeable performance boost from just switching between the two… Are you installing any dependencies in the containers that could affect your app?

OK, so we’ve confirmed the base image source in our docker files was indeed the issue. Our livepush and general system instability was caused by using : FROM balenalib/raspberrypi3-alpine-python:3.8-latest

When we replaced the base image for all of our 6 containers with the image used in the Balena multi-container example: FROM balenalib/fincm3-python:3-stretch-run, All become well. Very snappy.

Curious about anyone’s thoughts. Note sure if we’re using docker image tags incorrectly which may have caused the issue? Regardless, breath of fresh air to have made progress on this.

Also! We can confirm for our application <1ms response time for requests using Redis as apposed to >30ms using http. We’ll be switching to Redis for sending data between containers from now on. This is a game changer for us and will make working in our 50ms hardware loop cycle much more friendly.

The Raspberry image you were using is based on Alpine and the Fin one on Debian – which are two completely different OSes and is the the most probable cause of the performance difference. Alpine is optimised for size so I wouldn’t be surprised it’s less snappy that a fully performance optimised OS such as Debian.

We initially chose alpine based on this balena blog post. I’m wondering if this is more to do with livepush and local development specifically with Alpine as apposed to the performance of Alpine itself. I’ve cross referenced on our related thread specifically on Livepush performance at Balena-engine crashing during livepush

Thanks all for your help.