System stability with 3+ services

Hello,

I’m running into issues with system stability when deploying a multicontainer setup in Balena. I’m wondering if others have experienced similar, and if there’s a straightforward way to diagnose the system. Stability here is quite ambiguous, so I’ll make a list below of the variety of issues I’m facing:

  • When building in local mode, the push command hangs at “Waiting for device state to settle”.
  • Ranom containers will stop and fail to restart giving a 404 error in the journalctl logs.
  • Even if small changes are made to the Dockerfiles of a service (such as setting an ENV var on the last line of the Dockerfile), the entire image needs to be rebuilt. This is true both in local mode and non local.
  • After pushing to a fleet (device not in local mode), the current release will remain stuck and never update to target release.

Please feel free to request any extra system info. I’m happy to provide, just not sure what could be useful.

The device this is running on is a Google Coral Dev board (the 1gb ram model).

Hi,

I would first recommend to look at the debugging masterclass: Balena Device Debugging Masterclass - Balena Documentation it covers many common issues and ways to debug the device even if the issue is uncommon. Particularly the kernel logs section comes to mind regarding the random errors you’re describing.
What sort of applications are you trying to run on your device? Do you have some dockerfile examples perhaps we can try to replicate the issue?