Are CPU, Memory, Temperature, and Storage stats reliable in balena dashboard.

Hi,
I see few stats on Balena dashboard like below.

I see CPU usage is 100%
But when I do “top” to see how much CPU the current processes are consuming,
it does not add up to even 2%.

Are these stats reliable ? If yes, then what is adding up to 100% CPU.

Thanks

Hello, Yes they should be reliable. What device are you using? Also, are you running top from the hostOS, or inside a container?

@srlowe My device type is beaglebone-black. The top in both container and the hostOS does not add up to 100 %. Is there any other command I should use.

Hi, Let me check with the device guys in the team to see if there is some known issue with the way we collect CPU usage, and that device.

Could you also let us know which balenaOS/supervisor versions you are running on that device please?

@srlowe
OS version balenaOS 2.53.9+rev2
Supervisor Version - 11.14.0

Hello, The only thing I’ve found is a bug in the supervisor which may under some circumstances report stale statistics to the dashboard. The fix for this has been made available in balenaOS v2.61.1 However, the latest currently available for your device is balenaOS v2.58.3, so we will need to wait for this to be available before we could test that.

In the mean time, a couple of things to try:

  • could you monitor these stats, and see if the CPU usage changes at all from 100% over a longer period of time
  • could you try restarting the device and see if this changes the readout

Thank you

@srlowe The CPU % stays at 100 even after restart.

Thanks for that info. By the way, this is the npm package that is used to read the CPU usage, in case you’d like to investigate that on your device : https://github.com/oscmejia/os-utils

@srlowe
For your information, few of my devices have a OS version of balenaOS 2.58.3+rev2 and supervisor version 11.14.0.

These devices also show 100% CPU usage.
Please let me know if there is a fix for this in the higher OS/supervisor versions.

hey Abhishek

I believe you will need need atleast supervisor version 12.0.2 to be able to have the fix mentioned above.

Are you able to try https://github.com/oscmejia/os-utils and see if it reports the correct CPU usage?

thanks

@rahul-thakoor @srlowe os-utils seems to show the correct stats.

Hey @agaurav if thats the case it sounds like the fix Rahul mentioned should work for you. Unfortunately that version of the supervisor isn’t yet available on an OS version. However, if you send me your balenaCloud user name and the device UUID in a private message I can manually upgrade that devices supervisor for you so that we can confirm the fix.

@agaurav its updated now and looks a little more correct but let us know as you test.

@shaunmulligan
Yupp, as you said, this looks a little more correct. It sometimes still shoots to 100 percent, but comes down to 30 again.
But I never see 100% usage when I do a “top” either inside container or the host OS.
I am not sure what process is consuming so much CPU.
Do you have any clue looking at the device?

Glad it looks more correct. If you want to monitor it more closely and figure out exactly what is going on, I think I would recommend setting up a full metrics solution, something like netdata which would graph the usage over time and allow you to drill down into the processes and what they are using. The metrics reported on our dashboard are more high level and will be a bit difficult to use in that way as you don’t get very much granularity.

Thanks @shaunmulligan
I will look into it.

No problem, let me know how it goes.