With an older version of the datadog agent, I was able to collect container logs through the docker socket. I believe this is the default behaviour for datadog docker agent log collection. This collection method does not work with the new version of the agent. It also required the agent to be restarted to detect new containers and lead to duplicate logs with the older version of the agent.
The balena-engine is configured to use the journald docker logging driver. The datadog agent has a configuration to allow it to collect logs from journald. By adding the io.balena.features.journal-logs: "1" label to the agent container, I am able to configure the datadog agent to collection logs from journald. The only problem is that these journald logs don’t contain a clean service name field. The fields present are:
CONTAINER_NAME: has format myservice_xxxx_xxxx where the xxxx are some numbers. This is almost what I want, just without the _xxxx_xxxx part.
CONTAINER_ID, CONTAINER_TAG, CONTAINER_ID_FULL, SYSLOG_IDENTIFIER all variations of the docker container id
_SYSTEMD_UNIT: always balena.service
various other fields like run command, machine id, etc
I have tried modifying my services in the docker-compose field like this:
Hi mpous, does this tutorial work for you guys? I tried running the sample dockerfile with the iot agent from the tutorial. No logs make it to datadog, and I can see logs being produced by my other services in journald. My service in the docker-compose file looks like this:
I also don’t understand the difference between the datadog iot agent that is installed in the tutorial and the regular datadog agent in the official docker image. Both produce the same version string, namely "Agent 7.34.0 - Commit: 7861858 - Serialization version: v5.0.9 - Go version: go1.16.12", so they look like the same program to me.
Hey
I followed the instructions in the blog post, and it seems that the Dockerfile is not working. I updated it, and it builds, but I can’t make it work. Could you share the DD’s Dockerfile so I can check?
I’m thinking that this is maybe what’s happening to you… Did you check the container’s log to see if the service is running or giving you an error?
The IOT example does not even get me any metrics to datadog let alone logging. The older non-iot-agent example at least got me some data into datadog but receiving logs there is sporadic at best
The example with “regular” datadog agent fails to build ending in a build error:
[Build] [datadog] github.com/fzipp/gocyclo (download)
[Build]
[Build] [datadog] package io/fs: unrecognized import path "io/fs": import path does not begin with hostname
[Build]
[Build] [datadog] processing checkout tool gotest.tools/gotestsum
[Build] processing checkout tool github.com/fzipp/gocyclo
[Build] [datadog] Removing intermediate container e25e33d11427
Some services failed to build:
datadog: The command '/bin/sh -c export PATH=$PATH:$GOPATH/bin GODEBUG=netdns=go && cd /usr/app/src/github.com/DataDog/datadog-agent && invoke deps -v' returned a non-zero code: 1
Additional information may be available with the `--debug` flag.
For further help or support, visit:
https://www.balena.io/docs/reference/balena-cli/#support-faq-and-troubleshooting
Other tests with the pre-built agent also fail to forward the logs
Update:
I’ve now tested by pinning the version of the datadog agent to 1.25.1 and then it starts working (somewhat). I’ve used the dockerfile from the IOT example as a starting point but instead of installing the agent-iot, the regular agent is installed. The iot agent does not have a docker integration so it cannot monitor it’s metrics as well.
FROM balenalib/%%BALENA_ARCH%%-ubuntu
WORKDIR /usr/app
# New Datadog install steps
RUN sudo apt-get update && apt-get install -y conntrack nano wget curl sudo apt-transport-https sudo gnupg2 net-tools && rm -rf /var/lib/apt/lists/*
RUN sudo sh -c "echo 'deb [signed-by=/usr/share/keyrings/datadog-archive-keyring.gpg] https://apt.datadoghq.com/ stable 7' > /etc/apt/sources.list.d/datadog.list"
RUN sudo touch /usr/share/keyrings/datadog-archive-keyring.gpg
RUN curl https://keys.datadoghq.com/DATADOG_APT_KEY_CURRENT.public | sudo gpg --no-default-keyring --keyring /usr/share/keyrings/datadog-archive-keyring.gpg --import --batch
RUN curl https://keys.datadoghq.com/DATADOG_APT_KEY_382E94DE.public | sudo gpg --no-default-keyring --keyring /usr/share/keyrings/datadog-archive-keyring.gpg --import --batch
RUN curl https://keys.datadoghq.com/DATADOG_APT_KEY_F14F620E.public | sudo gpg --no-default-keyring --keyring /usr/share/keyrings/datadog-archive-keyring.gpg --import --batch
RUN sudo apt-get update && apt-get install datadog-agent=1:7.25.1-1 datadog-signing-keys && rm -rf /var/lib/apt/lists/*
COPY files /usr/app/files
RUN cp /usr/app/files/disk.yaml /etc/datadog-agent/conf.d/disk.d/conf.yaml.default
RUN cp /usr/app/files/network.yaml /etc/datadog-agent/conf.d/network.d/conf.yaml.default
RUN chmod +x files/start.sh
CMD ["bash","./files/start.sh"]
I’ve also modified start.sh a bit to reflect a more humanly readable hostname:
#!/usr/bin/env bash
set -euo pipefail
if [ -z ${DD_API_KEY+x} ]
then
echo "ERROR: DD_API_KEY IS NOT SET"
exit 1
fi
#ln -sf /var/run/balena.sock /var/run/docker.sock
export DD_HOSTNAME=$(echo $BALENA_DEVICE_NAME_AT_INIT | sed 's|[^a-zA-Z0-9]|-|g')
datadog-agent -c files/datadog.yaml run
I do not think it is a good final solution because I would prefer to keep track of version updates but it might give us some clues on why it is not working with newer versions of the datadog agent anymore.
Update 2:
When using a pinned version of the IOT agent to the same version of the regular agent it still does not collect the logs from docker.
Update 3:
Now that it is working somewhat, I run into the same issue as the original author of this thread where the service name and source do not make any sense in the datadog logging:
For what it’s worth, the IoT version of the Datadog agent does not support docker metrics (at least it did not a few months ago, they may have updated it)
Here’s the install script I’m using for the IoT agent on a Raspberry Pi CM4 and it works perfectly.
Dockerfile
FROM balenalib/aarch64-ubuntu
WORKDIR /app
RUN sudo apt-get update && apt-get install -y conntrack nano wget curl sudo apt-transport-https sudo gnupg2 net-tools jq
RUN sudo sh -c "echo 'deb [signed-by=/usr/share/keyrings/datadog-archive-keyring.gpg] https://apt.datadoghq.com/ stable 7' > /etc/apt/sources.list.d/datadog.list"
RUN sudo touch /usr/share/keyrings/datadog-archive-keyring.gpg
RUN curl https://keys.datadoghq.com/DATADOG_APT_KEY_CURRENT.public | sudo gpg --no-default-keyring --keyring /usr/share/keyrings/datadog-archive-keyring.gpg --import --batch
RUN curl https://keys.datadoghq.com/DATADOG_APT_KEY_382E94DE.public | sudo gpg --no-default-keyring --keyring /usr/share/keyrings/datadog-archive-keyring.gpg --import --batch
RUN curl https://keys.datadoghq.com/DATADOG_APT_KEY_F14F620E.public | sudo gpg --no-default-keyring --keyring /usr/share/keyrings/datadog-archive-keyring.gpg --import --batch
RUN sudo apt-get update && apt-get install datadog-iot-agent datadog-signing-keys
# RUN apt update && apt install -y nano wget curl sudo apt-transport-https sudo gnupg2
# RUN sudo sh -c "echo 'deb https://apt.datadoghq.com/ stable 7' > /etc/apt/sources.list.d/datadog.list"
# RUN sudo apt-key adv --recv-keys --keyserver hkp://keyserver.ubuntu.com:80 A2923DFF56EDA6E76E55E492D3A80E30382E94DE
# RUN sudo apt-get update && sudo apt-get install datadog-iot-agent
COPY files /app/files
# Move the standard datadog configs
RUN cp /app/files/datadog.yaml /etc/datadog-agent/datadog.yaml
RUN cp /app/files/system-probe.yaml /etc/datadog-agent/system-probe.yaml
RUN cp /app/files/disk.yaml /etc/datadog-agent/conf.d/disk.d/conf.yaml
RUN cp /app/files/network.yaml /etc/datadog-agent/conf.d/network.d/conf.yaml
# # Add Python integration & logs
# RUN mkdir /etc/datadog-agent/conf.d/python.d
# RUN cp /app/files/python.yaml /etc/datadog-agent/conf.d/python.d/conf.yaml.default
# Add Python integration & logs
RUN mkdir /etc/datadog-agent/conf.d/python.d
RUN cp /app/files/python.yaml /etc/datadog-agent/conf.d/python.d/conf.yaml.default
# # Add custom Basicstation logs
RUN mkdir /etc/datadog-agent/conf.d/basicstation.d
RUN cp /app/files/basicstation.yaml /etc/datadog-agent/conf.d/basicstation.d/conf.yaml
# Add custom losant-edge-agent logs
RUN mkdir /etc/datadog-agent/conf.d/losant.d
RUN cp /app/files/losant.yaml /etc/datadog-agent/conf.d/losant.d/conf.yaml
RUN chmod +x files/start.sh
CMD ["bash","./files/start.sh"]
start.sh
#!/bin/bash
###############################
# COLOR SETUP
###############################
export INFO_COLOR="\033[96m"
export ERROR_COLOR="\033[91m"
export WARN_COLOR="\033[93m"
export CLEAR_COLOR="\033[0m"
###############################
# LOGS SETUP
###############################
LOGS_LOCATION=/persistent-data/datadog-start.log
function timestamp(){
date "+%s" # Here we're using unix timestamp
}
function info(){
message="$1"
level=INFO
echo '{}' | \
jq --monochrome-output \
--compact-output \
--raw-output \
--arg timestamp "$(timestamp)" \
--arg level "$level" \
--arg message "$message" \
--arg user "$USER" \
--arg file "$(basename "$BASH_SOURCE")" \
'.timestamp=$timestamp|.level=$level|.message=$message|.user=$user|.file=$file' >> $LOGS_LOCATION
echo -e "${INFO_COLOR}$(timestamp) [$level] $message${CLEAR_COLOR}"
}
function warn(){
message="$1"
level=WARN
echo '{}' | \
jq --monochrome-output \
--compact-output \
--raw-output \
--arg timestamp "$(timestamp)" \
--arg level "$level" \
--arg message "$message" \
--arg user "$USER" \
--arg file "$(basename "$BASH_SOURCE")" \
'.timestamp=$timestamp|.level=$level|.message=$message|.user=$user|.file=$file' >> $LOGS_LOCATION
echo -e "${WARN_COLOR}$(timestamp) [$level] $message${CLEAR_COLOR}"
}
function error(){
message="$1"
level=ERROR
echo '{}' | \
jq --monochrome-output \
--compact-output \
--raw-output \
--arg timestamp "$(timestamp)" \
--arg level "$level" \
--arg message "$message" \
--arg user "$USER" \
--arg file "$(basename "$BASH_SOURCE")" \
'.timestamp=$timestamp|.level=$level|.message=$message|.user=$user|.file=$file' >> $LOGS_LOCATION
echo -e "${ERROR_COLOR}$(timestamp) [$level] $message${CLEAR_COLOR}"
}
if [ -z ${DD_API_KEY+x} ]
then
warn "DD_API_KEY variable is missing or misconfigured."
balena-idle
else
info "DD_API_KEY configured, setting tags for datadog-agent..."
fi
ln -sf /var/run/balena.sock /var/run/docker.sock
GATEWAY_MAC=$(cat /sys/class/net/eth0/address | sed -r 's/[:]+//g' | tr [:lower:] [:upper:])
GATEWAY_EUI=$(cat /sys/class/net/eth0/address | sed -r 's/[:]+//g' | sed -e 's#\(.\{6\}\)\(.*\)#\1fffe\2#g' | tr [:lower:] [:upper:])
# MODEM_MODEL=$(mmcli -m 0 --output-json | jq '.modem.generic.model')
# MODEM_IMEI=$(mmcli -m 0 --output-json | jq '.modem["3gpp"].imei|tonumber')
# Add all variables to the datadog.yaml config file
# BE SURE TO SET ENV value in balena application. Options are: env:play env:test env:stag env:prod
echo -e "api_key: $DD_API_KEY\nenv: $ENV\ntags:\n - availability-zone:wilderness\n - gateway_eui:$GATEWAY_EUI\n - balena_app_id:$BALENA_APP_ID\n - balena_app_name:$BALENA_APP_NAME\n - balena_device_aarch:$BALENA_DEVICE_ARCH\n - balena_host_os_version:$BALENA_HOST_OS_VERSION\n - balena_device_name_at_init:$BALENA_DEVICE_NAME_AT_INIT\n - host_aliases:$BALENA_DEVICE_NAME_AT_INIT" | cat - files/datadog.yaml > temp && mv temp /etc/datadog-agent/datadog.yaml
# Run this only if you copy datadog.yaml to /etc/datadog-agent/datadog.yaml
info "Tags set. Starting agent..."
datadog-agent run
Notice the line: echo -e "api_key: $DD_API_KEY\nenv: $ENV\ntags:\n - availability-zone:wilderness\n - gateway_eui:$GATEWAY_EUI\n - balena_app_id:$BALENA_APP_ID\n - balena_app_name:$BALENA_APP_NAME\n - balena_device_aarch:$BALENA_DEVICE_ARCH\n - balena_host_os_version:$BALENA_HOST_OS_VERSION\n - balena_device_name_at_init:$BALENA_DEVICE_NAME_AT_INIT\n - host_aliases:$BALENA_DEVICE_NAME_AT_INIT" | cat - files/datadog.yaml > temp && mv temp /etc/datadog-agent/datadog.yaml which add’s a bunch of device tags to the agent so in Datadog you can keep track of your device and search for it based on tags.
Keep in mind I’m not using the journald method to get Balena logs into datadog (I wanted more manually control over what logs are being sent) to keep log volume low. Instead I’m sending specific application logs to a log file and tailing those with the datadog agent.
@mpous I’m not fully satisfied with the way it is working right now. Once it is, I will issue a PR to the repo such that others can benefit from it as well. I’m in a discussion with Datadog support to figure out why things are not working as intended.