Hi @klutchell,
We are running:
- balenaOS 6.5.24+rev5
- supervisor 17.0.2
- intel NUC
- 8GB RAM
and we are encountering similar issues. Here a snapshot of our logging. Balena supervisor fails the healthchecks a few times, then restarts.
[balena_supe][INFO] 2025-05-19 22:54:37,742: [info] Reported current state to the cloud
[balena-supe][INFO] 2025-05-19 22:54:37,743: [info] Reported current state to the cloud
[balena_supe][INFO] 2025-05-19 22:55:42,978: [info] Healthcheck failure - memory usage above threshold after 16h 45m 28s
[balena_supe][ERROR] 2025-05-19 22:55:42,982: [error] Healthcheck failed
[balena-supe][INFO] 2025-05-19 22:55:42,982: [info] Healthcheck failure - memory usage above threshold after 16h 45m 28s
[balena-supe][ERROR] 2025-05-19 22:55:42,982: [error] Healthcheck failed
[balena-supe][INFO] 2025-05-19 22:55:42,982: [api] GET /v1/healthy 500 - 1.224 ms
[balena_supe][INFO] 2025-05-19 22:55:42,983: [api] GET /v1/healthy 500 - 1.224 ms
[healthdog][INFO] 2025-05-19 22:56:33,936: try: 1, refid: C17B26AC, correction: 0.000785253, skew: 0.333
[healthdog][INFO] 2025-05-19 22:58:33,947: try: 1, refid: C17B26AC, correction: 0.000783230, skew: 0.333
[balena_supe][INFO] 2025-05-19 22:59:38,002: [info] Reported current state to the cloud
[balena-supe][INFO] 2025-05-19 22:59:38,003: [info] Reported current state to the cloud
[fake-hwcloc][INFO] 2025-05-19 23:00:04,043: [fake-hwclock] Saving system time to /etc/fake-hwclock/fake-hwclock.data.
[fake-hwcloc][INFO] 2025-05-19 23:00:04,049: Saving system time to /etc/fake-hwclock/fake-hwclock.data.
[healthdog][INFO] 2025-05-19 23:00:33,965: try: 1, refid: C17B26AC, correction: 0.000781208, skew: 0.333
[balena_supe][INFO] 2025-05-19 23:00:43,124: [info] Healthcheck failure - memory usage above threshold after 16h 50m 28s
[balena-supe][INFO] 2025-05-19 23:00:43,128: [info] Healthcheck failure - memory usage above threshold after 16h 50m 28s
[balena-supe][ERROR] 2025-05-19 23:00:43,128: [error] Healthcheck failed
[balena-supe][INFO] 2025-05-19 23:00:43,128: [api] GET /v1/healthy 500 - 0.964 ms
[balena_supe][ERROR] 2025-05-19 23:00:43,129: [error] Healthcheck failed
[balena_supe][INFO] 2025-05-19 23:00:43,129: [api] GET /v1/healthy 500 - 0.964 ms
[healthdog][INFO] 2025-05-19 23:02:33,978: try: 1, refid: C17B26AC, correction: 0.000779185, skew: 0.333
[healthdog][INFO] 2025-05-19 23:04:33,992: try: 1, refid: C17B26AC, correction: 0.000777163, skew: 0.333
[balena_supe][INFO] 2025-05-19 23:04:48,324: [info] Reported current state to the cloud
[balena-supe][INFO] 2025-05-19 23:04:48,325: [info] Reported current state to the cloud
[balena_supe][INFO] 2025-05-19 23:05:43,256: [info] Healthcheck failure - memory usage above threshold after 16h 55m 28s
[balena-supe][INFO] 2025-05-19 23:05:43,261: [info] Healthcheck failure - memory usage above threshold after 16h 55m 28s
[balena-supe][ERROR] 2025-05-19 23:05:43,261: [error] Healthcheck failed
[balena-supe][INFO] 2025-05-19 23:05:43,261: [api] GET /v1/healthy 500 - 1.320 ms
[balena_supe][ERROR] 2025-05-19 23:05:43,261: [error] Healthcheck failed
[balena_supe][INFO] 2025-05-19 23:05:43,261: [api] GET /v1/healthy 500 - 1.320 ms
[balenad][INFO] 2025-05-19 23:05:43,266: time="2025-05-19T23:05:43.266361049Z" level=info msg="Unhealthy container 8ccecaa06b30a36b0e5bd2129984f5c9cf4891eeed63739b66dea71cc08cb2c9: restarting..."
[balena_supe][INFO] 2025-05-19 23:05:43,444: [info] Received SIGTERM. Exiting.
[balena-supe][INFO] 2025-05-19 23:05:43,444: [info] Received SIGTERM. Exiting.
What would you recommend?
Thanks!