Wrong CPU stats on balena Cloud

Hi folks,

I am trying to identify why Balena Cloud is reporting that our device has 95% CPU usage when the device stats look so differently (94% idle!):

% balena stats
CONTAINER ID   NAME                                                             CPU %     MEM USAGE / LIMIT     MEM %     NET I/O           BLOCK I/O         PIDS
2b76944ae2b7   swx-stream-db_4953668_2178716_25b43e27a6202dbb2340a2656274637f   2.05%     636.7MiB / 7.593GiB   8.19%     1.25MB / 1.18MB   117MB / 2.67MB    58
4f60029d4e41   swx-broker_4953666_2178716_25b43e27a6202dbb2340a2656274637f      3.84%     436.3MiB / 7.593GiB   5.61%     1.5MB / 1.57MB    24.3MB / 2.09MB   92
cf5ef51fd4c6   swx-db_4953665_2178716_25b43e27a6202dbb2340a2656274637f          0.04%     61.92MiB / 7.593GiB   0.80%     3.51kB / 0B       24.3MB / 205kB    10
aadbe9a42ee9   swx-schemes_4953669_2178716_25b43e27a6202dbb2340a2656274637f     0.94%     303.7MiB / 7.593GiB   3.91%     342kB / 327kB     115MB / 12.3kB    34
dfb230664d6b   balena_supervisor

Top shows also this:

% top
Mem: 2409556K used, 5552444K free, 63468K shrd, 47756K buff, 657672K cached
CPU:   3% usr   1% sys   0% nic  94% idle   0% io   0% irq   0% sirq
Load average: 0.30 0.42 0.56 2/510 15548
  PID  PPID USER     STAT   VSZ %VSZ %CPU COMMAND
 2278  2254 root     S     305m   4%   1% node /usr/src/app/dist/app.js
 2789  2771 1000     S    7371m  94%   1% java -Xmx1G -Xms1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -XX:MaxInlineLevel=15 -Djava.awt.headless=true -Xlog:gc*:file=/var/log/kafka/kafkaServer-gc.log:time,tags:filecount=10,files
 3180  2996 1000     S    6773m  86%   1% java -cp /usr/share/java/ksqldb-rest-app/*: -Xmx3g -server -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:+CMSScavengeBeforeRemark -XX:+ExplicitGCInvokesConcurrent -XX:NewRatio=1 -Djava.awt.headless=true -Dcom.sun.management.jmxremote -Dcom.sun.man
 1124     1 root     S    1889m  24%   0% /usr/bin/balenad --experimental --log-driver=journald --storage-driver=overlay2 -H fd:// -H unix:///var/run/balena.sock -H unix:///var/run/balena-engine.sock --dns=10.114.102.1 --bip=10.114.101.1/24 --fixed-cidr=10.114.101.0/25 --max-download-attempts=10 --ex
 1813  1763 1000     S    4085m  52%   0% java -Xmx512M -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -Djava.awt.headless=true -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jm
 1447  1124 root     S    1888m  24%   0% balena-engine-containerd --config /var/run/balena-engine/containerd/containerd.toml --log-level info
 2973  1447 root     S    1238m  16%   0% balena-engine-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/2b76944ae2b7204910245bd1dba4a0f53df8e2961ce4a065f9929f0dc822e6b5 -address /var/run/balena-engine/containerd/balena-engine-containerd.s
 2576  1447 root     S    1311m  17%   0% balena-engine-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/cf5ef51fd4c642b74226734bacb44067087e7428c3c8bf44799befb25ceb3bf9 -address /var/run/balena-engine/containerd/balena-engine-containerd.s
 2771  1447 root     S    1239m  16%   0% balena-engine-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/4f60029d4e41c514d56e5d8c487ffda20db48b177183c807cac4def1c1a0ee82 -address /var/run/balena-engine/containerd/balena-engine-containerd.s
  731     1 root     S    34748   0%   0% /lib/systemd/systemd-journald
15396  2459 root     R     4300   0%   0% top
   14     2 root     IW       0   0%   0% [rcu_preempt]

Any reason why the CPU reading is so wrong on balena?

This is a x86_64 device running supervisor 13.1.8.

Thanks

I noticed this too, I think this information on the dashboard is not real time

Thanks for your feedback. Indeed, the dashboard CPU usage represents the load average of the device when the stats were read, which can often be inaccurate when the load changes quickly. It’s possible switching this to using a five minute load average instead of an instantaneous reading would be more accurate and useful. I’ve relayed this feedback to the team.

1 Like

Hi, the issue of obsolete CPU usage displayed in the balenaCloud dashboard has been resolved with Supervisor version 14.4.1. See the link below for upgrade instructions. This version ensures that the device and balenaCloud are synchronized on all metrics data displayed in the dashboard. The refresh rate for metrics values generally is 5 minutes, unless a state change like restarting or updating services happens sooner.
–Ken

1 Like