Wrong CPU stats on balena Cloud

RodrigoM · May 26, 2022, 6:52pm

Hi folks,

I am trying to identify why Balena Cloud is reporting that our device has 95% CPU usage when the device stats look so differently (94% idle!):

% balena stats
CONTAINER ID   NAME                                                             CPU %     MEM USAGE / LIMIT     MEM %     NET I/O           BLOCK I/O         PIDS
2b76944ae2b7   swx-stream-db_4953668_2178716_25b43e27a6202dbb2340a2656274637f   2.05%     636.7MiB / 7.593GiB   8.19%     1.25MB / 1.18MB   117MB / 2.67MB    58
4f60029d4e41   swx-broker_4953666_2178716_25b43e27a6202dbb2340a2656274637f      3.84%     436.3MiB / 7.593GiB   5.61%     1.5MB / 1.57MB    24.3MB / 2.09MB   92
cf5ef51fd4c6   swx-db_4953665_2178716_25b43e27a6202dbb2340a2656274637f          0.04%     61.92MiB / 7.593GiB   0.80%     3.51kB / 0B       24.3MB / 205kB    10
aadbe9a42ee9   swx-schemes_4953669_2178716_25b43e27a6202dbb2340a2656274637f     0.94%     303.7MiB / 7.593GiB   3.91%     342kB / 327kB     115MB / 12.3kB    34
dfb230664d6b   balena_supervisor

Top shows also this:

% top
Mem: 2409556K used, 5552444K free, 63468K shrd, 47756K buff, 657672K cached
CPU:   3% usr   1% sys   0% nic  94% idle   0% io   0% irq   0% sirq
Load average: 0.30 0.42 0.56 2/510 15548
  PID  PPID USER     STAT   VSZ %VSZ %CPU COMMAND
 2278  2254 root     S     305m   4%   1% node /usr/src/app/dist/app.js
 2789  2771 1000     S    7371m  94%   1% java -Xmx1G -Xms1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -XX:MaxInlineLevel=15 -Djava.awt.headless=true -Xlog:gc*:file=/var/log/kafka/kafkaServer-gc.log:time,tags:filecount=10,files
 3180  2996 1000     S    6773m  86%   1% java -cp /usr/share/java/ksqldb-rest-app/*: -Xmx3g -server -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:+CMSScavengeBeforeRemark -XX:+ExplicitGCInvokesConcurrent -XX:NewRatio=1 -Djava.awt.headless=true -Dcom.sun.management.jmxremote -Dcom.sun.man
 1124     1 root     S    1889m  24%   0% /usr/bin/balenad --experimental --log-driver=journald --storage-driver=overlay2 -H fd:// -H unix:///var/run/balena.sock -H unix:///var/run/balena-engine.sock --dns=10.114.102.1 --bip=10.114.101.1/24 --fixed-cidr=10.114.101.0/25 --max-download-attempts=10 --ex
 1813  1763 1000     S    4085m  52%   0% java -Xmx512M -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -Djava.awt.headless=true -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jm
 1447  1124 root     S    1888m  24%   0% balena-engine-containerd --config /var/run/balena-engine/containerd/containerd.toml --log-level info
 2973  1447 root     S    1238m  16%   0% balena-engine-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/2b76944ae2b7204910245bd1dba4a0f53df8e2961ce4a065f9929f0dc822e6b5 -address /var/run/balena-engine/containerd/balena-engine-containerd.s
 2576  1447 root     S    1311m  17%   0% balena-engine-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/cf5ef51fd4c642b74226734bacb44067087e7428c3c8bf44799befb25ceb3bf9 -address /var/run/balena-engine/containerd/balena-engine-containerd.s
 2771  1447 root     S    1239m  16%   0% balena-engine-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/4f60029d4e41c514d56e5d8c487ffda20db48b177183c807cac4def1c1a0ee82 -address /var/run/balena-engine/containerd/balena-engine-containerd.s
  731     1 root     S    34748   0%   0% /lib/systemd/systemd-journald
15396  2459 root     R     4300   0%   0% top
   14     2 root     IW       0   0%   0% [rcu_preempt]

Any reason why the CPU reading is so wrong on balena?

This is a x86_64 device running supervisor 13.1.8.

Thanks

RonaldGuilhermePDS · May 26, 2022, 7:50pm

I noticed this too, I think this information on the dashboard is not real time

jakogut · May 26, 2022, 10:07pm

Thanks for your feedback. Indeed, the dashboard CPU usage represents the load average of the device when the stats were read, which can often be inaccurate when the load changes quickly. It’s possible switching this to using a five minute load average instead of an instantaneous reading would be more accurate and useful. I’ve relayed this feedback to the team.

kb2ma · November 22, 2022, 12:59pm

Hi, the issue of obsolete CPU usage displayed in the balenaCloud dashboard has been resolved with Supervisor version 14.4.1. See the link below for upgrade instructions. This version ensures that the device and balenaCloud are synchronized on all metrics data displayed in the dashboard. The refresh rate for metrics values generally is 5 minutes, unless a state change like restarting or updating services happens sooner.
–Ken

Topic		Replies	Views
Are CPU, Memory, Temperature, and Storage stats reliable in balena dashboard. Product support	17	1651	November 18, 2020
Balena Cloud API - Device - Missing fields balenaHub	3	480	April 19, 2022
CPU Usage Metric reports high Product support raspberrypi4	8	905	November 22, 2022
Balena dashboard on RPi 3 b+ always shows CPU at 100% Product support raspberrypi3	4	792	November 22, 2022
Hardware Metrics Reporting Issue openBalena	4	406	October 25, 2022

Wrong CPU stats on balena Cloud

Related topics