LogBackend: server responded with status code: 504 - Resolution?

We are using openbalena along with balenaOS 2.88.4 and supervisor 12.11.0. I first noticed an issue when devices no longer report being online, “IS ONLINE” when running balena-cli devices always shows as “false” despite the devices being online. When I checked out the supervisor container logs to look for clues, I found the errors below relating to LogBackend. I’ve noticed other users have this same problem but can’t seem to find a resolution anywhere - does anyone know what might be causing it?

[debug]   Spawning journald with: chroot  /mnt/root journalctl -a -S 2021-12-20 03:54:33 -o json CONTAINER_ID_FULL=3c573858628819a4ec0005716da9d08e6836936734235dcd5145220bdd5f44a7
[error]   LogBackend: server responded with status code: 504
[error]   LogBackend: server responded with status code: 504
[error]   LogBackend: server responded with status code: 504
[error]   LogBackend: server responded with status code: 504
[api]     GET /v1/healthy 200 - 13.162 ms
[error]   LogBackend: server responded with status code: 504
[error]   LogBackend: server responded with status code: 504
[error]   LogBackend: server responded with status code: 504
[error]   LogBackend: server responded with status code: 504
[error]   LogBackend: server responded with status code: 504
[error]   LogBackend: server responded with status code: 504
[api]     GET /v1/healthy 200 - 7.779 ms
[debug]   Attempting container log timestamp flush...
[debug]   Container log timestamp flush complete
[error]   LogBackend: server responded with status code: 504
[error]   LogBackend: server responded with status code: 504
[error]   LogBackend: server responded with status code: 504
[error]   LogBackend: server responded with status code: 504
[error]   LogBackend: server responded with status code: 504
[error]   LogBackend: server responded with status code: 504
[api]     GET /v1/healthy 200 - 7.127 ms
[error]   LogBackend: server responded with status code: 504
[error]   LogBackend: server responded with status code: 504
[error]   LogBackend: server responded with status code: 504
[error]   LogBackend: server responded with status code: 504
[error]   LogBackend: server responded with status code: 504
[error]   LogBackend: server responded with status code: 504
[debug]   Attempting container log timestamp flush...
[debug]   Container log timestamp flush complete
[api]     GET /v1/healthy 200 - 5.961 ms
[error]   LogBackend: server responded with status code: 504
[error]   LogBackend: server responded with status code: 504
[error]   LogBackend: server responded with status code: 504
[error]   LogBackend: server responded with status code: 504
[error]   LogBackend: server responded with status code: 504
[error]   LogBackend: server responded with status code: 504
[info]    Healthcheck failure - At least ONE of the following conditions must be true:
[info]          - No applyInProgress      ? false
[info]          - fetchesInProgress       ? false
[info]          - cycleTimeWithinInterval ? false
[error]   Healthcheck failed

Hello,

it looks like connecting to openBalena API is timing out during establishing the logging.
Just for the background information, the error is raised in the supervisor here: GitHub web editor

To clarify the logs that you see. The log entry is a successful check on the local supervisor API and not a correct check of the openBalena API healthy check.
[api] GET /v1/healthy 200 - 13.162 ms

To test the connection to the openBalen API please check following steps:

  • Can please perform this step for debugging to check if the openBalena API is reachable from this device.
curl -k $(cat /mnt/boot/config.json | jq -r .apiEndpoint)/ping
  • Please perform this step for debugging if the device is registered and can read a target state from the api
curl -H "Authorization: Bearer $(cat /mnt/boot/config.json | jq -r .deviceApiKey)" $(cat /mnt/boot/config.json | jq -r .apiEndpoint)/device/v2/$(cat /mnt/boot/config.json | jq -r .uuid)/state

Best Regards
Harald

@fisehara see below for the results of each - appears that both commands work as intended. I’m still puzzled as to why the devices are not reporting as being online with balena-cli devices when they appear to be able to connect to the openbalena server. This is preventing us from being able to establish remote connections to the devices. I appreciate your help getting to the bottom of this.

curl -k $(cat /mnt/boot/config.json | jq -r .apiEndpoint)/ping:

OK

curl -H “Authorization: Bearer $(cat /mnt/boot/config.json | jq -r .deviceApiKey)” $(cat /mnt/boot/config.json | jq -r .apiEndpoint)/device/v2/$(cat /mnt/boot/config.json | jq -r .uuid)/state

{
  "local": {
    "name": "dawn-cloud",
    "config": {
      "RESIN_HOST_FIREWALL_MODE": "",
      "RESIN_SUPERVISOR_POLL_INTERVAL": "600000"
    },
    "apps": {
      "1": {
        "releaseId": 116,
        "commit": "d9599998780886d6248fe0d925ad8b8c",
        "name": "<REDACTED>",
        "services": {
          "1": {
            "restart": "always",
            "privileged": true,
            "devices": [
              "/dev:/dev"
            ],
            "imageId": 624,
            "serviceName": "<REDACTED>",
            "image": "<REDACTED>",
            "running": true,
            "environment": {
              "DBUS_SYSTEM_BUS_ADDRESS": "unix:path=/host/run/dbus/system_bus_socket"
            },
            "labels": {}
          },
          <OTHER APPS REDACTED>
      }
    }
  },
  "dependent": {
    "apps": {},
    "devices": {}
  }
}

Looks like I’ve figured out the issue as to why my devices are all showing up as offline. This commit to balena-os added a reliance on the date header from the connectivity-check endpoint, which I’m guessing is present in the response to connectivity-check from balena cloud. However, in openbalena, the date header is removed removed from the response. This means that anyone running v2.88 or newer of balena-os and using openbalena will lose all vpn connectivity to their devices, because the openvpn service relies on the new timesync-https service, and the latter ends up in an infinite loop waiting for a date header that will never come. I fixed the issue simply by removing the line:

res.removeHeader('Date');

from src/index.ts in open-balena-api and restarting it.

The commit to balena-os noted above is flagged as a “minor” change but it’s a breaking change to anyone running openbalena, so in addition to making the above change to openbalena it might be worth notifying users running openbalena of this (perhaps this posting serves as that?) and also modifying the balena-os script meta-balena-common/recipes-core/systemd/timeinit/timesync-https.sh to not get stuck in an infinite loop waiting for the date header.

But, unfortunately, while this did solve the issue as to why my devices were all offline, I’m still getting the LogBackend 504 error…

Hi, recent versions of balenaOS used the connectivity check endpoint to set an initial time based on its http date header. Unfortunately the openBalena connectivity check endpoint does not currently set the date header so the operating system does not transition to its network ready target.

We are going to discuss internally both to provide a date header in the connectivity endpoint and to modify the OS to realize the endpoint has been reached but did not provide a date header so it can transition to network ready state anyway.

For the time being and as a workaround you could try to set .os.network.connectivity.uri in config.json to https://api.balena-cloud.com/connectivity-check as in GitHub - balena-os/meta-balena: A collection of Yocto layers used to build balenaOS images. The balenaCloud API does currently include a date header.