We’ve been tracking down random, sporadic NUC hanging for quite awhile.
There seems to be a memory-related issue. We’ve put tight memory management on our service containers. Finally we installed fluent-bit to help us get more info out of the system.
Now I am seeing in the fluent-bit logs that balena_healthcheck and balena_supervisor go through start-shutdown cycles continuously. Is this normal?
Hello Sandy, i see some weird LogBackend errors. I pinged internally the supervisor team to see how we can help you more.
On the other hand, is the device performing well for you now?
Hi there, I also took a look at your device, and at the diagnostics logs that you shared. From both the logs today, and the diagnostics, I don’t see any supervisor container restarts. If you were to take measurements again today, I don’t think that we will see all of these start->shutdown cycles.
As to why they were happening when these fluent-bit logs were initially taken, I’m not sure - something must have been happening on the device at the time. I find the logs a little hard to parse - could you tell us roughly how frequently/the time between these cycles is?
Hi, Marc,
Thanks for checking into this. At this moment, the device is behaving well. My expectation is that in a week or two I may see a system crash.
I’m curious about the LogBackend errors. Can you tell me more about this?
Sandy