LogBackend: server responded with status code: 504 - Mystery Solved

For those running openbalena who have observed an error message that regularly appears in device supervisor logs:

[error]   LogBackend: server responded with status code: 504

The mystery has finally been solved - and I’d like to share the resolution with the community.

First some background information: device logs (including container logs and other supervisor events) are passed from the device’s balena-supervisor to open-balena-api via the endpoint /device/v2/:uuid/log-stream. Upon receiving a log event, balena-supervisor opens a connection to the log-stream endpoint, but rather than streaming log events directly to it, aggregates them in a local buffer first. This buffer then flushes to the log-stream endpoint on the earlier of aggregating 50 log lines or 60 seconds of inactivity.

We recently added a loki server to our openbalena instance to aggregate and monitor server side logs, which prompted us to look into why devices were routinely failing to post logs to the log-stream endpoint (which in turn posts those logs to loki). Turns out that it has to do with a timeout mismatch between balena-supervisor and haproxy-ingress; haproxy-ingress has a default connection timeout of 50 seconds, so unless your device generates more than 50 log lines in 50 seconds, it the connection will be timed out by the server, resulting in the LogBackend 504 message showing up on the device, and the log lines not being sent to the log-stream endpoint.

To solve this, we had to change the haproxy-ingress default timeouts, which in our case the changes were effected via a helm script config, but depending on your configuration might be implemented differently (however the underlying settings changes should remain the same):

    config:
      timeout-server: 75s
      timeout-server-fin: 75s
      timeout-client: 75s
      timeout-client-fin: 75s

After deploying with this change, all of the 504 errors should disappear (because balena-supervisor flushes the log after 60 seconds, before the server aborts the connection), and logs are captured by log-stream.

Hope this is helps!

Hello and thank you for sharing!

Where do I find haproxy-ingress configuration files, with default timeouts?

Thanks!

@theinspector that would depend on how you are running openbalena. If you are using docker-compose, I believe it’s located in src/haproxy/haproxy.cfg. If you are using k8s / helm scripts, the config snippet above is specified in values.yml in the following location:

haproxy:
  controller:
    config:
      timeout-server: 75s
      ...

@drcnyc thank you! I actually have a docker-compose version of OpenBalena, so I did a little bit of research and found something similar to what you mentioned in the openbalena_haproxy container.

Actually, I found the haproxy.cfg file in the /usr/local/etc/haproxy folder.

I edited the file, modifying

defaults
  timeout connect 5s
  timeout client 50s
  timeout server 50s

to:

defaults
  timeout connect 5s
  timeout client 75s
  timeout server 75s

…and restarting the openbalena_haproxy container.

Nothing changed! What am I missing?
Thank you in advance :smiling_face:

EDIT:
I also tried to edit the file you mentioned (I finally found it in the /open-balena folder, my bad) and restarted the whole stack through ./scripts/compose down && ./scripts/compose up -d.
Nothing changed anyway: balena-supervisor still gets the 504 error.

I’m less familiar with the docker-compose world, but when you mention you found it in /usr/local/etc/haproxy I’m assuming you are referring to that path being in the haproxy container? If so, your change may not have persisted when you restarted it. Regarding modifying the file in your open-balena folder, that was the file I was referring to above - but you may have to rebuild the container after modifying it for your change to take effect.

Hello @drcnyc and thank you again!
I guess the changes I made actually took effect, since before doing any change I’ve done the following:

  • Killed openbalena_haproxy container;
  • Ran docker system prune -a .

And, only after that:

I re-inspected the openbalena-haproxy container and found out that the haproxy.cfg file - located in
/usr/local/etc/haproxy - was correctly updated to:

defaults
  timeout connect 5s
  timeout client 75s
  timeout server 75s

So I guess changes actually took effect!

I guess I’m missing something… Thanks in advance for your kindness :smiling_face:

@theinspector sounds like you have the correct default config - although you are missing the server-fin and client-fin options, I don’t think those will matter here. We’ll need to dig a bit deeper to understand why it’s not working. Could you paste your entire haproxy.cfg file? It might be overridden somewhere else.

@drcnyc sure, here you have it!

global
  tune.ssl.default-dh-param 1024

defaults
  timeout connect 5s
  timeout client 75s
  timeout server 75s

frontend http-in
  mode http
  option forwardfor
  bind *:80
  reqadd X-Forwarded-Proto:\ http

  acl is_cert_validation path -i -m beg "/.well-known/acme-challenge/"
  use_backend cert-provider if is_cert_validation

  acl host_api hdr_dom(host) -i "api.${HAPROXY_HOSTNAME}"
  use_backend backend_api if host_api

  acl host_registry hdr_dom(host) -i "registry.${HAPROXY_HOSTNAME}"
  use_backend backend_registry if host_registry

  acl host_vpn hdr_dom(host) -i "vpn.${HAPROXY_HOSTNAME}"
  use_backend backend_vpn if host_vpn

  acl host_s3 hdr_dom(host) -i "s3.${HAPROXY_HOSTNAME}"
  use_backend backend_s3 if host_s3

frontend ssl-in
  mode tcp
  bind *:443
  tcp-request inspect-delay 2s
  tcp-request content accept if { req.ssl_hello_type 1 }

  acl is_ssl req.ssl_ver 2:3.4

  acl host_tunnel req_ssl_sni -i "tunnel.${HAPROXY_HOSTNAME}"
  use_backend redirect-to-tunnel-in if host_tunnel

  use_backend redirect-to-https-in if is_ssl
  use_backend vpn-devices if !is_ssl

backend redirect-to-https-in
  mode tcp
  balance roundrobin
  server localhost 127.0.0.1:444 send-proxy-v2

backend redirect-to-tunnel-in
  mode tcp
  balance roundrobin
  server localhost 127.0.0.1:3129

frontend https-in
  mode http
  option forwardfor
  bind 127.0.0.1:444 ssl crt /etc/ssl/private/open-balena.pem accept-proxy
  reqadd X-Forwarded-Proto:\ https

  acl host_api hdr_dom(host) -i "api.${HAPROXY_HOSTNAME}"
  use_backend backend_api if host_api

  acl host_registry hdr_dom(host) -i "registry.${HAPROXY_HOSTNAME}"
  use_backend backend_registry if host_registry

  acl host_vpn hdr_dom(host) -i "vpn.${HAPROXY_HOSTNAME}"
  use_backend backend_vpn if host_vpn

  acl host_s3 hdr_dom(host) -i "s3.${HAPROXY_HOSTNAME}"
  use_backend backend_s3 if host_s3

backend backend_api
  mode http
  option forwardfor
  balance roundrobin
  server balena_api_1 api:80 check port 80

backend backend_registry
  mode http
  option forwardfor
  balance roundrobin
  server balena_registry_1 registry:80 check port 80

backend backend_vpn
  mode http
  option forwardfor
  balance roundrobin
  server balena_vpn_1 vpn:80 check port 80

backend backend_s3
  mode http
  option forwardfor
  balance roundrobin
  server balena_s3_1 s3:80 check port 80

backend cert-provider
  mode http
  option forwardfor
  balance roundrobin
  server balena_cert-provider_1 cert-provider:80 no-check

backend vpn-devices
  mode tcp
  server balena_vpn_1 vpn:443 send-proxy-v2 check-send-proxy port 443

frontend db
  mode tcp
  bind *:5432
  default_backend backend_db
  timeout client 1h

backend backend_db
  mode tcp
  server balena_db_1 db:5432 check port 5432

frontend redis
  mode tcp
  bind *:6379
  default_backend backend_redis
  timeout client 1h

backend backend_redis
  mode tcp
  server balena_redis_1 redis:6379 check port 6379

listen vpn-tunnel
  mode tcp
  bind *:3128
  server balena_vpn vpn:3128 check port 3128

listen vpn-tunnel-tls
  mode tcp
  bind *:3129 ssl crt /etc/ssl/private/open-balena.pem
  server balena_vpn vpn:3128 check port 3128

Let me know if you need anything else :smiling_face:

Any news? :slight_smile:

@theinspector

It appears you have to update the haproxy base image as well. (1.9 is pretty old anyway)
This however creates 2 new issues:

  1. reqadd has been deprecated so you have to replace it twice in the haproxy.cfg:
    frontend http-in
       ...
    #   reqadd X-Forwarded-Proto:\ http
       http-request add-header X-Forwarded-Proto http
    
    and
    frontend https-in
       ...
    #   reqadd X-Forwarded-Proto:\ https
       http-request add-header X-Forwarded-Proto https
    
  2. The new haproxy base image also no longer runs as root but as user called haproxy. You will have to make some minor modifications to the haproxy Dockerfile to keep everything working:
    FROM haproxy:2.9.6-alpine
    
    VOLUME [ "/certs" ]
    
    # Switch back to root to install packages
    USER root
    
    RUN apk add --update inotify-tools
    
    # Make haproxy user owner of certificate directory (is root by default)
    RUN chown haproxy:haproxy /etc/ssl/private
    
    # Switch back to haproxy user
    USER haproxy 
    
    COPY haproxy.cfg /usr/local/etc/haproxy/haproxy.cfg
    COPY start-haproxy.sh /start-haproxy
    
    CMD /start-haproxy
    

Hi everyone,

@Maurits mentioned updating the HAProxy base image and making specific config adjustments (like replacing reqadd with http-request add-header and modifying the Dockerfile for non-root usage) to fix the [error] LogBackend: server responded with status code: 504.

Has anyone tried this solution yet? Just looking for a quick confirmation on whether it’s been tested and proven effective.

Thanks!
Matias

I’ve been running this in our dev environment for a week now and everything seems stable. (Tested API and VPN). The reason changing those timeouts worked for drcnyc is because he’s using a haproxy helm chart with a more recent haproxy version compared the one openbalena is currently using.

Hello @Maurits ,
The HAProxy update worked flawlessly, but now I’m seeing an error in the balena_supervisor logs: [error] LogBackend: server responded with status code: 408.

Additionally, I have a question regarding the certificate renewal process. Initially, I ran the quickstart script without the -c parameter and with the OPENBALENA_ACME_CERT_ENABLED environment variable set to false.

Regards
Matias

@matialvarezs

I’m not seeing any errors in my balena_supervisor logs. Have you checked if the API is working correctly? Anything in the API logs? Is https://api.{yourdomain}/ping returning OK?

We use the cert-provider to handle the renewal of certificates so I don’t know if I’ll be much help.
Probably better to start a new topic for that unless it’s related to this.

@Maurits yes https://api.{yourdomain}/ping return OK,

What certificate provider do you use and how did you configure it?

I’m from Chile, I don’t speak English, but I see that you use openbalena, is it possible that we can talk directly? In a chat? To ask you those questions?

@matialvarezs

I’m talking about the cert-provider container included in the openbalena project. You can enable it by running the quickstart script with the -c flag.

Feel free to shoot any questions my way. Just send me a private message with your favorite chat platform and we’ll sort it out.