Hi,
Wanted to check some files in one of the containers running in the device from the terminal session, but cannot open the terminal neither on the host device nor in any of the containers. The containers run fine, their log messages are OK. The OS is Generic x86_64 (GPT), version 3.1.6+rev1, in production mode.
Tried to stop all containers, does not make a difference. Tried restarting the host - the same result.
I could connect fine before.
Another device in the same fleet, with OS in development mode is fine.
There are no any error messages like timeout, just the circle icon turns yellow and then red, and stays like that.
What can I do to make terminal sessions working again?
Hello,
Thanks for reaching out. There could be internal issues that are resulting in such a state. Are the devices in online mode? On the right sidebar, head to the diagnostics and click the Run Diagnostics
button to start the test run. Do report back with the error if you see some failures in the report you get after finishing diagnostics.
I did run diagnostics, the standard output is quite large and hard to judge what is the result. The standard error has some messages about missing some files, some log files, but it is hard to tell if any of those are fatal.
Anyhow, after that I could open the terminal session in the host and in the container. So seems like the issue is resolved.
Before running Device diagnostics, I also run device health checks, that one reported all success.
Thank you for the quick support!
Apologies for the confusion, I was indeed pointing to the device health checks. It could be an issue with the device failing to connect with balena VPN which leads to the status heartbeat only. Glad it all got resolved.
Have the issue again. Containers were stopped when I could open terminal sessions. Now they are running. Device health checks are all green, and the status of the device is online (shows online for 23 hours).
I am running one container with some load, and CPU and temperature readings are somewhat not consistent. It shows almost always that CPU usage is about 1%, but the temperature can go up to 70 C, Currently it is 55 C. I think the CPU usage should be higher, 10 % maybe. This is the device with 12 logical CPUs, but anyway.
Another one of the reasons I wanted to connect was to check if the CPU reading displayed by balena is correct.
Anyway, CPU should not be higher than let say 50 % at maximum, not sure what can cause this issue again.
That is very odd. I would recommend going through journalctl logs and checking it for any errors. It must be a critical component crashlooping in your system for some reason causing the terminal sessions to not maintain connection. Trying to isolate the root cause here would be critical to debugging the issue considering this is happening to only this device in the fleet. The status has been fine on our side: https://status.balena.io/
Here’s more information on device metrics to help you out: Device Metrics - Balena Documentation
Now it is OK again.
It seems that CPU usage is not displayed correctly: I haven’t seen it being any other value than 1%.
Meanwhile, top
in the container shows different results:
Hello @Ravil the dashboard metrics are not real-time and can take some minutes to update. However we do have some known issues with CPU reporting. You can track this Github issue here balenaCloud CPU Usage metric not updating as expected · Issue #1907 · balena-os/balena-supervisor · GitHub
In addition there is another issue from the NPM package that we use systeminformation
regarding the difference between values from top and the dashboard here Data is not accurate · Issue #60 · sebhildebrandt/systeminformation · GitHub
Let us know if you need more help!