CPU Usage Metric reports high

I am currently in the testing stage of a multi-container app, and I’ve noticed that some of my devices report very high CPU usage at about (>90%), but other methods of measuring CPU usage return different results.

For example, I have a device reporting 89% CPU usage, but when SSH’ing into the Host OS and running top, CPU usage is reported as 30%

Any known reason why this would happen, and which measurement is to be trusted?

Hey, balena supervisor uses systeminformation package to detect cpu usage (see: systeminformation/cpu.js at master · sebhildebrandt/systeminformation · GitHub). Are you consistently seeing wrong numbers or this is a one-of issue?

1 Like

I’m seeing it fairly consistently off a fleet of ~20 devices

Hey @jordanhardy1 do you see that systeinformation returns same that is visualized on the balena metrics? or different numbers? Just let us know and feel free to share some numbers if needed.

I see the same thing as well. Below is a pic of an generic x86 64 bit, but I also see it on rpi 4. I only took screenshots of top, but how would I see the output of systeinformation

I’ve observed the exact same behavior. Spiked CPU in the Balena dashboard (over 80%) when checking the terminal the machine reports less than 10%.

However, very happy I found this forum since I was looking for a solution for getting WiFi stats in node and the link to system information was exactly what I was looking for!

Looks like that also means my recommendation for getting WiFi stats into the dashboard might also use systeminformation library: Recommendation: RSSI stats in dashboard

Just to confirm that I’ve seen this too recently. A manual inspection always finds that it’s not the case, and interestingly it will stay at 88% cpu for hours with no change in idle once, with a temp fix a reboot until it gets in the same state.

Thanks all for reporting this issue. Recently I have seen it as well, and posted this GitHub issue. In summary, the metrics on the dashboard were never meant to be realtime. CPU usage in particular also uses buckets of 20% to trigger reporting. Also recently we have been optimizing metrics reporting to manage server load.

The GitHub issue describes some ways to make the reporting more responsive. You can follow and comment on the issue as we work on a resolution.