In the chart below you see the CPU usage of my raspberry pi device running the balenaOS.
So it reports an almost constant “user CPU” percentage of 35%.
So why is this process using so much CPU and can this be fixed ?
FYI I think this issue started when I activated the monitoring of the containers in my telegraf container (see section [[inputs.docker]] of my telegraf.conf)
I am kind of thinking out loud here… First it would be useful to confirm whether the issue is related to the telegraf container, as you’ve hinted. If you pause or stop the telegraf container and run the top command again on a terminal (I guess the per-process CPU usage you shared was produced by the top command), then does the CPU usage of balenad drop considerably?
I can imagine that if telegraf was asking balenad for a lot of system metrics quite often, then the CPU burden of gathering the data could fall heavily on balenad. If this was the case, then some tweaks in telegraf.conf might help. Just to see if it makes a difference, perhaps you could try changing interval = "10s" to interval = "60s" under the [agent] section of telegraf.conf. Maybe also try changing quiet = false to quiet = true, which could reduce expensive I/O.
Let us know what you find, and we’ll go from there!
Thanks for the feedback.
I have stopped the telegraf container and this didn’t make a difference:
the top command still showed an utilization of 30% for the balenad container (see screenshot below).
Note that I have also stopped all 8 containers and balenad process is still reporting 30%.
Oh, all containers stopped and balenad still uses 30% CPU? This should definitely not happen. For reference, I’m running 3 containers from this multicontainer-getting-started app on a Raspberry Pi 3, and balenad uses 0% CPU even while the containers are running:
You’re also running on a Raspberry Pi 3, right? Something I noticed in the output of your top command is that the RAM memory is fully utilised, whereas my output above shows that my Pi is using only a third of its 1GB. Could your 8 containers be using too much memory? Just a thought.
I suggest that you try playing with some of the following command lines on the Host OS terminal. Run top after each of them to check how they’ve affected CPU usage:
balena ps - lists the running containers (including the balena supervisor).
balena stats - prints CPU, memory, network and disk usage for each container
systemctl stop resin-supervisor - this command stops the balena supervisor app, that runs in its own container. The supervisor is responsible for automatically starting and stopping your app containers as controlled through the web dashboard. The reason for manually stopping it is to prevent it from automatically restarting the app containers that the following commands will stop.
balena stop $(balena ps -aq) - stops all containers, after which balena ps should show an empty list. (Run systemctl stop resin-supervisor before running this.) Check what top says after running this command.
systemctl stop balena - this stops the balena engine itself, the daemon that executes the balena commands like balena ps. After running this command, even balena ps will fail to run. Run top after running this command. Surely the CPU usage will have dropped!
systemctl start balena - start the balena daemon again. Run top after running this command.
systemctl start resin-supervisor - this starts the balena supervisor again. After a few seconds, the supervisor will automatically start your app containers as controlled by the web dashboard. Run balena ps, balena stats and top after running this command.
If your app can partially function with only a subset of those 8 containers, try starting only that minimal subset to see if memory usage and CPU are reduced. The CPU usage may have nothing to do with memory usage, but the findings may help with the investigation.
I did some more deployments (also increased the GPU memory and even added an additional container) recently and with one of those deployments the problem all of sudden disappeared. Here below you see the TOP output I executed this morning.
The grafana chart below shows the system metrics of my raspberry for the last week. It clearly shows the period from 11/24 until 11/29 that the CPU was high.
When looking at the chart, I think that the problem got fixed when I increased RESIN_HOST_CONFIG_gpu_mem to 160 as I was experimenting with a media player container (kodi)
This question was interesting to me since, using a similar setup with only 3 containers, I reach by using top a memory usage of almost 900MB in the RPi.
I have 3 containers:
debian stretch with a small Go application.
Node-red in Alpine.
Influxdb in Alpine.
However, you showed 363k.
Were you, maybe, running in production mode?
Also, if I run balena stats, I see 4 containers that use in sum no more than 220 MB. That would mean that 600k or so are being used by the entire balena solution.
hey @mvargasevans, VSZ is the Virtual Memory Size.
It’s actually how much memory a process has available for its execution, including memory that is allocated, but not used, and memory from shared libraries. Does that make sense ?
For accessing some RES memory info (how much actual physical memory a process is consuming) you can check the output of: cat /proc/PID/status
Hey, just wanted to add something on this: increasing the amount of gpu_mem available is generally going to improve the memory usage, this is especially true if the application is performing and graphically intensive task. Looking at the two screens you provided: the memory usage of the balena daemon and the supervisor looks comparable (as expected since these shouldn’t benefit much from the gpu_mem). I suspect that the memory reduction you are reporting comes from something else.
I would suggest following the steps that pdcastro provided above to gather some information. It would be especially interesting to compare the difference between memory usage between stopping all containers (balena stop $(balena ps -aq)) and after stopping the balena engine itself (systemctl stop balena).
I find the MemAvailable value in meminfo critical.
A deep dive on what this means here.
MemAvailable: An estimate of how much memory is available for starting new
applications, without swapping. Calculated from MemFree,
SReclaimable, the size of the file LRU lists, and the low
watermarks in each zone.
The estimate takes into account that the system needs some
page cache to function well, and that not all reclaimable
slab will be reclaimable, due to items being in use. The
impact of those factors will vary from system to system.
If I read all these correctly, I have tons of memory available and the Balena system is consuming around 200 MB.
Hey @mvargasevans thanks for the detailed report I don’t think we have a fixed reference number for how much memory should be used by balenaEngine when no containers are running, but I recall seeing similar results the last time we looked into this (hopefully I am not misremembering, as atm I couldn’t find the previous thread where we investigated this). I pinged the balenaEngine team to have a look at this conversation, as they might be able to share some more information on the matter.
Well, it looks like I was mistaken, the number should probably be lower than that. This is a screenshot I managed to find on a previous investigation on the matter, keep in mind this was for an older version of the OS and some things may have changed in the meantime. The first vertical line marks the point where the supervisor was stopped, while the second one marks where balena was stopped.