For those running openbalena who have observed an error message that regularly appears in device supervisor logs:
[error] LogBackend: server responded with status code: 504
The mystery has finally been solved - and I’d like to share the resolution with the community.
First some background information: device logs (including container logs and other supervisor events) are passed from the device’s balena-supervisor to open-balena-api via the endpoint /device/v2/:uuid/log-stream. Upon receiving a log event, balena-supervisor opens a connection to the log-stream endpoint, but rather than streaming log events directly to it, aggregates them in a local buffer first. This buffer then flushes to the log-stream endpoint on the earlier of aggregating 50 log lines or 60 seconds of inactivity.
We recently added a loki server to our openbalena instance to aggregate and monitor server side logs, which prompted us to look into why devices were routinely failing to post logs to the log-stream endpoint (which in turn posts those logs to loki). Turns out that it has to do with a timeout mismatch between balena-supervisor and haproxy-ingress; haproxy-ingress has a default connection timeout of 50 seconds, so unless your device generates more than 50 log lines in 50 seconds, it the connection will be timed out by the server, resulting in the LogBackend 504 message showing up on the device, and the log lines not being sent to the log-stream endpoint.
To solve this, we had to change the haproxy-ingress default timeouts, which in our case the changes were effected via a helm script config, but depending on your configuration might be implemented differently (however the underlying settings changes should remain the same):
After deploying with this change, all of the 504 errors should disappear (because balena-supervisor flushes the log after 60 seconds, before the server aborts the connection), and logs are captured by log-stream.
@theinspector that would depend on how you are running openbalena. If you are using docker-compose, I believe it’s located in src/haproxy/haproxy.cfg. If you are using k8s / helm scripts, the config snippet above is specified in values.yml in the following location:
@drcnyc thank you! I actually have a docker-compose version of OpenBalena, so I did a little bit of research and found something similar to what you mentioned in the openbalena_haproxy container.
Actually, I found the haproxy.cfg file in the /usr/local/etc/haproxy folder.
I edited the file, modifying
defaults
timeout connect 5s
timeout client 50s
timeout server 50s
to:
defaults
timeout connect 5s
timeout client 75s
timeout server 75s
…and restarting the openbalena_haproxy container.
Nothing changed! What am I missing?
Thank you in advance
EDIT:
I also tried to edit the file you mentioned (I finally found it in the /open-balena folder, my bad) and restarted the whole stack through ./scripts/compose down && ./scripts/compose up -d.
Nothing changed anyway: balena-supervisor still gets the 504 error.
I’m less familiar with the docker-compose world, but when you mention you found it in /usr/local/etc/haproxy I’m assuming you are referring to that path being in the haproxy container? If so, your change may not have persisted when you restarted it. Regarding modifying the file in your open-balena folder, that was the file I was referring to above - but you may have to rebuild the container after modifying it for your change to take effect.
Hello @drcnyc and thank you again!
I guess the changes I made actually took effect, since before doing any change I’ve done the following:
Killed openbalena_haproxy container;
Ran docker system prune -a .
And, only after that:
I re-inspected the openbalena-haproxy container and found out that the haproxy.cfg file - located in /usr/local/etc/haproxy - was correctly updated to:
defaults
timeout connect 5s
timeout client 75s
timeout server 75s
So I guess changes actually took effect!
I guess I’m missing something… Thanks in advance for your kindness
@theinspector sounds like you have the correct default config - although you are missing the server-fin and client-fin options, I don’t think those will matter here. We’ll need to dig a bit deeper to understand why it’s not working. Could you paste your entire haproxy.cfg file? It might be overridden somewhere else.
The new haproxy base image also no longer runs as root but as user called haproxy. You will have to make some minor modifications to the haproxy Dockerfile to keep everything working:
FROM haproxy:2.9.6-alpine
VOLUME [ "/certs" ]
# Switch back to root to install packages
USER root
RUN apk add --update inotify-tools
# Make haproxy user owner of certificate directory (is root by default)
RUN chown haproxy:haproxy /etc/ssl/private
# Switch back to haproxy user
USER haproxy
COPY haproxy.cfg /usr/local/etc/haproxy/haproxy.cfg
COPY start-haproxy.sh /start-haproxy
CMD /start-haproxy
@Maurits mentioned updating the HAProxy base image and making specific config adjustments (like replacing reqadd with http-request add-header and modifying the Dockerfile for non-root usage) to fix the [error] LogBackend: server responded with status code: 504.
Has anyone tried this solution yet? Just looking for a quick confirmation on whether it’s been tested and proven effective.
I’ve been running this in our dev environment for a week now and everything seems stable. (Tested API and VPN). The reason changing those timeouts worked for drcnyc is because he’s using a haproxy helm chart with a more recent haproxy version compared the one openbalena is currently using.
Hello @Maurits ,
The HAProxy update worked flawlessly, but now I’m seeing an error in the balena_supervisor logs: [error] LogBackend: server responded with status code: 408.
Additionally, I have a question regarding the certificate renewal process. Initially, I ran the quickstart script without the -c parameter and with the OPENBALENA_ACME_CERT_ENABLED environment variable set to false.
I’m not seeing any errors in my balena_supervisor logs. Have you checked if the API is working correctly? Anything in the API logs? Is https://api.{yourdomain}/ping returning OK?
We use the cert-provider to handle the renewal of certificates so I don’t know if I’ll be much help.
Probably better to start a new topic for that unless it’s related to this.
What certificate provider do you use and how did you configure it?
I’m from Chile, I don’t speak English, but I see that you use openbalena, is it possible that we can talk directly? In a chat? To ask you those questions?