LogBackend: server responded with status code: 504 - Mystery Solved

For those running openbalena who have observed an error message that regularly appears in device supervisor logs:

[error]   LogBackend: server responded with status code: 504

The mystery has finally been solved - and I’d like to share the resolution with the community.

First some background information: device logs (including container logs and other supervisor events) are passed from the device’s balena-supervisor to open-balena-api via the endpoint /device/v2/:uuid/log-stream. Upon receiving a log event, balena-supervisor opens a connection to the log-stream endpoint, but rather than streaming log events directly to it, aggregates them in a local buffer first. This buffer then flushes to the log-stream endpoint on the earlier of aggregating 50 log lines or 60 seconds of inactivity.

We recently added a loki server to our openbalena instance to aggregate and monitor server side logs, which prompted us to look into why devices were routinely failing to post logs to the log-stream endpoint (which in turn posts those logs to loki). Turns out that it has to do with a timeout mismatch between balena-supervisor and haproxy-ingress; haproxy-ingress has a default connection timeout of 50 seconds, so unless your device generates more than 50 log lines in 50 seconds, it the connection will be timed out by the server, resulting in the LogBackend 504 message showing up on the device, and the log lines not being sent to the log-stream endpoint.

To solve this, we had to change the haproxy-ingress default timeouts, which in our case the changes were effected via a helm script config, but depending on your configuration might be implemented differently (however the underlying settings changes should remain the same):

    config:
      timeout-server: 75s
      timeout-server-fin: 75s
      timeout-client: 75s
      timeout-client-fin: 75s

After deploying with this change, all of the 504 errors should disappear (because balena-supervisor flushes the log after 60 seconds, before the server aborts the connection), and logs are captured by log-stream.

Hope this is helps!

Hello and thank you for sharing!

Where do I find haproxy-ingress configuration files, with default timeouts?

Thanks!