Service name in journald fields for datadog agent log collection

Hello,

I am trying to configure the datadog docker agent to collect logs from my containers running on BalenaOS. My version information is:

  • datadog docker agent 7.34.0 (the docker image tag is datadog/agent:7.34.0 from dockerhub)
  • BalenaOS balenaOS 2.68.1+rev1 (production version)

With an older version of the datadog agent, I was able to collect container logs through the docker socket. I believe this is the default behaviour for datadog docker agent log collection. This collection method does not work with the new version of the agent. It also required the agent to be restarted to detect new containers and lead to duplicate logs with the older version of the agent.

The balena-engine is configured to use the journald docker logging driver. The datadog agent has a configuration to allow it to collect logs from journald. By adding the io.balena.features.journal-logs: "1" label to the agent container, I am able to configure the datadog agent to collection logs from journald. The only problem is that these journald logs don’t contain a clean service name field. The fields present are:

  • CONTAINER_NAME: has format myservice_xxxx_xxxx where the xxxx are some numbers. This is almost what I want, just without the _xxxx_xxxx part.
  • CONTAINER_ID, CONTAINER_TAG, CONTAINER_ID_FULL, SYSLOG_IDENTIFIER all variations of the docker container id
  • _SYSTEMD_UNIT: always balena.service
  • various other fields like run command, machine id, etc

I have tried modifying my services in the docker-compose field like this:

services:
  myservice:
    logging:
      driver: journald
      options:
        tag: myservice
        labels: customname
    labels:
      customname: myservice

but the custom tags and labels don’t end up in the journald log fields. These tags and labels do work if I run on docker on my local computer).

For reference, datadog agent configuration looks like:

#  /etc/datadog-agent/datadog-docker.yaml
log_level: error

listeners:
  - name: docker
config_providers:
  - name: docker
    polling: true

jmx_use_cgroup_memory_limit: true

logs_enabled: true

docker_env_as_tags:
  BALENA_DEVICE_UUID: balena_device_uuid
  ENV: env

docker_labels_as_tags:
  com.docker.compose.service: service_name

ac_exclude: []
ac_include: []

use_dogstatsd: true
dogstatsd_port: 8125

# /etc/datadog-agent/conf.d/docker-journald-logs.d/conf.yaml
logs:
  - type: journald
    container_mode: true
    path: /run/log/journal
    source: mysource
    include_units:
      - balena.service

Thanks for any help!

Hello @seb2 did you follow this tutorial IoT fleet monitoring with Datadog and balenaCloud: How small agent containers make a big impact ?

Hi mpous, I have not tried the datadog iot agent. I’ll try out the tutorial in the link you provide.

Hi mpous, does this tutorial work for you guys? I tried running the sample dockerfile with the iot agent from the tutorial. No logs make it to datadog, and I can see logs being produced by my other services in journald. My service in the docker-compose file looks like this:

  datadog:
    build:
      context: datadog-iot
    privileged: true
    pid: host
    network_mode: host
    volumes:
      - datadog-data:/data
    tmpfs:
      - /tmp-logs
    labels:
      io.balena.features.dbus: "1"
      io.balena.features.supervisor-api: "1"
      io.balena.features.balena-socket: "1"
      io.balena.features.journal-logs: "1"

I also don’t understand the difference between the datadog iot agent that is installed in the tutorial and the regular datadog agent in the official docker image. Both produce the same version string, namely "Agent 7.34.0 - Commit: 7861858 - Serialization version: v5.0.9 - Go version: go1.16.12", so they look like the same program to me.

Hey
I followed the instructions in the blog post, and it seems that the Dockerfile is not working. I updated it, and it builds, but I can’t make it work. Could you share the DD’s Dockerfile so I can check?

I’m thinking that this is maybe what’s happening to you… Did you check the container’s log to see if the service is running or giving you an error?

Hey @seb2 i discovered a newer project made with balena and datadog. Did you try this GitHub - balena-io-playground/datadog-kerberos ?

Let me know if that works on your device.

Hello @seb2 did you try this new repo that i shared?

For our application the datadog part of the kerebos example did not work eiher.

The IOT example does not even get me any metrics to datadog let alone logging. The older non-iot-agent example at least got me some data into datadog but receiving logs there is sporadic at best

The example with “regular” datadog agent fails to build ending in a build error:

[Build]   [datadog] github.com/fzipp/gocyclo (download)
[Build]   
[Build]   [datadog] package io/fs: unrecognized import path "io/fs": import path does not begin with hostname
[Build]   
[Build]   [datadog] processing checkout tool gotest.tools/gotestsum
[Build]   processing checkout tool github.com/fzipp/gocyclo
[Build]   [datadog] Removing intermediate container e25e33d11427
Some services failed to build:
        datadog: The command '/bin/sh -c export PATH=$PATH:$GOPATH/bin GODEBUG=netdns=go &&   cd /usr/app/src/github.com/DataDog/datadog-agent &&   invoke deps -v' returned a non-zero code: 1


Additional information may be available with the `--debug` flag.

For further help or support, visit:
https://www.balena.io/docs/reference/balena-cli/#support-faq-and-troubleshooting

Other tests with the pre-built agent also fail to forward the logs

Update:
I’ve now tested by pinning the version of the datadog agent to 1.25.1 and then it starts working (somewhat). I’ve used the dockerfile from the IOT example as a starting point but instead of installing the agent-iot, the regular agent is installed. The iot agent does not have a docker integration so it cannot monitor it’s metrics as well.

FROM balenalib/%%BALENA_ARCH%%-ubuntu
WORKDIR /usr/app

# New Datadog install steps
RUN sudo apt-get update && apt-get install -y conntrack nano wget curl sudo apt-transport-https sudo gnupg2 net-tools && rm -rf /var/lib/apt/lists/*
RUN sudo sh -c "echo 'deb [signed-by=/usr/share/keyrings/datadog-archive-keyring.gpg] https://apt.datadoghq.com/ stable 7' > /etc/apt/sources.list.d/datadog.list"
RUN sudo touch /usr/share/keyrings/datadog-archive-keyring.gpg
RUN curl https://keys.datadoghq.com/DATADOG_APT_KEY_CURRENT.public | sudo gpg --no-default-keyring --keyring /usr/share/keyrings/datadog-archive-keyring.gpg --import --batch
RUN curl https://keys.datadoghq.com/DATADOG_APT_KEY_382E94DE.public | sudo gpg --no-default-keyring --keyring /usr/share/keyrings/datadog-archive-keyring.gpg --import --batch
RUN curl https://keys.datadoghq.com/DATADOG_APT_KEY_F14F620E.public | sudo gpg --no-default-keyring --keyring /usr/share/keyrings/datadog-archive-keyring.gpg --import --batch
RUN sudo apt-get update && apt-get install datadog-agent=1:7.25.1-1 datadog-signing-keys && rm -rf /var/lib/apt/lists/*

COPY files /usr/app/files

RUN cp /usr/app/files/disk.yaml /etc/datadog-agent/conf.d/disk.d/conf.yaml.default
RUN cp /usr/app/files/network.yaml /etc/datadog-agent/conf.d/network.d/conf.yaml.default

RUN chmod +x files/start.sh
CMD ["bash","./files/start.sh"]

I’ve also modified start.sh a bit to reflect a more humanly readable hostname:

#!/usr/bin/env bash
set -euo pipefail

if [ -z ${DD_API_KEY+x} ]
then
  echo "ERROR: DD_API_KEY IS NOT SET"
  exit 1
fi

#ln -sf /var/run/balena.sock /var/run/docker.sock
export DD_HOSTNAME=$(echo $BALENA_DEVICE_NAME_AT_INIT | sed  's|[^a-zA-Z0-9]|-|g')

datadog-agent -c files/datadog.yaml run

I do not think it is a good final solution because I would prefer to keep track of version updates but it might give us some clues on why it is not working with newer versions of the datadog agent anymore.

Update 2:
When using a pinned version of the IOT agent to the same version of the regular agent it still does not collect the logs from docker.

Update 3:
Now that it is working somewhat, I run into the same issue as the original author of this thread where the service name and source do not make any sense in the datadog logging:

1 Like

For what it’s worth, the IoT version of the Datadog agent does not support docker metrics (at least it did not a few months ago, they may have updated it)

Here’s the install script I’m using for the IoT agent on a Raspberry Pi CM4 and it works perfectly.

Dockerfile

FROM balenalib/aarch64-ubuntu

WORKDIR /app
RUN sudo apt-get update && apt-get install -y conntrack nano wget curl sudo apt-transport-https sudo gnupg2 net-tools jq
RUN sudo sh -c "echo 'deb [signed-by=/usr/share/keyrings/datadog-archive-keyring.gpg] https://apt.datadoghq.com/ stable 7' > /etc/apt/sources.list.d/datadog.list"
RUN sudo touch /usr/share/keyrings/datadog-archive-keyring.gpg
RUN curl https://keys.datadoghq.com/DATADOG_APT_KEY_CURRENT.public | sudo gpg --no-default-keyring --keyring /usr/share/keyrings/datadog-archive-keyring.gpg --import --batch
RUN curl https://keys.datadoghq.com/DATADOG_APT_KEY_382E94DE.public | sudo gpg --no-default-keyring --keyring /usr/share/keyrings/datadog-archive-keyring.gpg --import --batch
RUN curl https://keys.datadoghq.com/DATADOG_APT_KEY_F14F620E.public | sudo gpg --no-default-keyring --keyring /usr/share/keyrings/datadog-archive-keyring.gpg --import --batch
RUN sudo apt-get update && apt-get install datadog-iot-agent datadog-signing-keys


# RUN apt update && apt install -y nano wget curl sudo apt-transport-https sudo gnupg2
# RUN sudo sh -c "echo 'deb https://apt.datadoghq.com/ stable 7' > /etc/apt/sources.list.d/datadog.list"
# RUN sudo apt-key adv --recv-keys --keyserver hkp://keyserver.ubuntu.com:80 A2923DFF56EDA6E76E55E492D3A80E30382E94DE
# RUN sudo apt-get update && sudo apt-get install datadog-iot-agent

COPY files /app/files

# Move the standard datadog configs
RUN cp /app/files/datadog.yaml /etc/datadog-agent/datadog.yaml
RUN cp /app/files/system-probe.yaml /etc/datadog-agent/system-probe.yaml

RUN cp /app/files/disk.yaml /etc/datadog-agent/conf.d/disk.d/conf.yaml
RUN cp /app/files/network.yaml /etc/datadog-agent/conf.d/network.d/conf.yaml

# # Add Python integration & logs
# RUN mkdir /etc/datadog-agent/conf.d/python.d
# RUN cp /app/files/python.yaml /etc/datadog-agent/conf.d/python.d/conf.yaml.default

# Add Python integration & logs
RUN mkdir /etc/datadog-agent/conf.d/python.d
RUN cp /app/files/python.yaml /etc/datadog-agent/conf.d/python.d/conf.yaml.default

# # Add custom Basicstation logs
RUN mkdir /etc/datadog-agent/conf.d/basicstation.d
RUN cp /app/files/basicstation.yaml /etc/datadog-agent/conf.d/basicstation.d/conf.yaml

# Add custom losant-edge-agent logs
RUN mkdir /etc/datadog-agent/conf.d/losant.d
RUN cp /app/files/losant.yaml /etc/datadog-agent/conf.d/losant.d/conf.yaml

RUN chmod +x files/start.sh
CMD ["bash","./files/start.sh"]

start.sh

#!/bin/bash

###############################
# COLOR SETUP 
###############################
export INFO_COLOR="\033[96m"
export ERROR_COLOR="\033[91m"
export WARN_COLOR="\033[93m"
export CLEAR_COLOR="\033[0m"

###############################
# LOGS SETUP
###############################
LOGS_LOCATION=/persistent-data/datadog-start.log

function timestamp(){
  date "+%s" # Here we're using unix timestamp
}

function info(){
    message="$1"
    level=INFO
    echo '{}' | \
    jq  --monochrome-output \
        --compact-output \
        --raw-output \
        --arg timestamp "$(timestamp)" \
        --arg level "$level" \
        --arg message "$message" \
        --arg user "$USER" \
        --arg file "$(basename "$BASH_SOURCE")" \
        '.timestamp=$timestamp|.level=$level|.message=$message|.user=$user|.file=$file' >> $LOGS_LOCATION
    echo -e "${INFO_COLOR}$(timestamp) [$level] $message${CLEAR_COLOR}"
}

function warn(){
    message="$1"
    level=WARN
    echo '{}' | \
    jq  --monochrome-output \
        --compact-output \
        --raw-output \
        --arg timestamp "$(timestamp)" \
        --arg level "$level" \
        --arg message "$message" \
        --arg user "$USER" \
        --arg file "$(basename "$BASH_SOURCE")" \
        '.timestamp=$timestamp|.level=$level|.message=$message|.user=$user|.file=$file' >> $LOGS_LOCATION
    echo -e "${WARN_COLOR}$(timestamp) [$level] $message${CLEAR_COLOR}"
}

function error(){
    message="$1"
    level=ERROR
    echo '{}' | \
    jq  --monochrome-output \
        --compact-output \
        --raw-output \
        --arg timestamp "$(timestamp)" \
        --arg level "$level" \
        --arg message "$message" \
        --arg user "$USER" \
        --arg file "$(basename "$BASH_SOURCE")" \
        '.timestamp=$timestamp|.level=$level|.message=$message|.user=$user|.file=$file' >> $LOGS_LOCATION
    echo -e "${ERROR_COLOR}$(timestamp) [$level] $message${CLEAR_COLOR}"
}


if [ -z ${DD_API_KEY+x} ]
then
  warn "DD_API_KEY variable is missing or misconfigured."
  balena-idle
else
  info "DD_API_KEY configured, setting tags for datadog-agent..."
fi

ln -sf /var/run/balena.sock /var/run/docker.sock


GATEWAY_MAC=$(cat /sys/class/net/eth0/address | sed -r 's/[:]+//g' | tr [:lower:] [:upper:])
GATEWAY_EUI=$(cat /sys/class/net/eth0/address | sed -r 's/[:]+//g' | sed -e 's#\(.\{6\}\)\(.*\)#\1fffe\2#g' | tr [:lower:] [:upper:])
# MODEM_MODEL=$(mmcli -m 0 --output-json | jq '.modem.generic.model')
# MODEM_IMEI=$(mmcli -m 0 --output-json | jq '.modem["3gpp"].imei|tonumber')

# Add all variables to the datadog.yaml config file
# BE SURE TO SET ENV value in balena application.  Options are: env:play env:test env:stag env:prod

echo -e "api_key: $DD_API_KEY\nenv: $ENV\ntags:\n  - availability-zone:wilderness\n  - gateway_eui:$GATEWAY_EUI\n  - balena_app_id:$BALENA_APP_ID\n  - balena_app_name:$BALENA_APP_NAME\n  - balena_device_aarch:$BALENA_DEVICE_ARCH\n  - balena_host_os_version:$BALENA_HOST_OS_VERSION\n  - balena_device_name_at_init:$BALENA_DEVICE_NAME_AT_INIT\n  - host_aliases:$BALENA_DEVICE_NAME_AT_INIT" | cat - files/datadog.yaml > temp && mv temp /etc/datadog-agent/datadog.yaml


# Run this only if you copy datadog.yaml to /etc/datadog-agent/datadog.yaml
info "Tags set. Starting agent..."
datadog-agent run

Notice the line: echo -e "api_key: $DD_API_KEY\nenv: $ENV\ntags:\n - availability-zone:wilderness\n - gateway_eui:$GATEWAY_EUI\n - balena_app_id:$BALENA_APP_ID\n - balena_app_name:$BALENA_APP_NAME\n - balena_device_aarch:$BALENA_DEVICE_ARCH\n - balena_host_os_version:$BALENA_HOST_OS_VERSION\n - balena_device_name_at_init:$BALENA_DEVICE_NAME_AT_INIT\n - host_aliases:$BALENA_DEVICE_NAME_AT_INIT" | cat - files/datadog.yaml > temp && mv temp /etc/datadog-agent/datadog.yaml which add’s a bunch of device tags to the agent so in Datadog you can keep track of your device and search for it based on tags.

Keep in mind I’m not using the journald method to get Balena logs into datadog (I wanted more manually control over what logs are being sent) to keep log volume low. Instead I’m sending specific application logs to a log file and tailing those with the datadog agent.

1 Like

Thanks for sharing @barryjump :clap:

@cees.koolen @barryjump feel free to PR the repo to enable other developers from the community to run datadog on their projects.

@mpous I’m not fully satisfied with the way it is working right now. Once it is, I will issue a PR to the repo such that others can benefit from it as well. I’m in a discussion with Datadog support to figure out why things are not working as intended.

1 Like

Thanks @cees.koolen

Feel free to open an issue if that makes sense, so we can follow up from there :slight_smile:

let’s stay connected