Observability solutions

zukoo · March 17, 2024, 4:17am

Hello,

What are the recommendations for observability?

Some of the things I want to achieve:

centralised searchable logs (ELK style)
granular metrics (host & container)
alarms on the above

I found that sumologic has a good pricing model for me. But is there some already integrated options?

mpous · March 18, 2024, 12:56pm

Hello @zukoo i don’t have any recommendation but did you read this?

In the meantime, I will ask internally if anyone from the team can help you!

kb2ma · March 18, 2024, 1:37pm

Here are a couple more blog posts:

Jason Dixon writes a weekly newsletter on monitoring. See the issue archive for lots of use cases and tools.

zukoo · March 20, 2024, 2:09am

Thanks for the responses,

I’m now considering more seriously Grafana cloud and Datadog.

I’ve tried to setup up both with the help from the above blog posts, Datadog works for the system metrics but I can see two things missing: no logs and no containers ( I can see the images but none of them shows as “Running”).

logs_enabled: true
listeners:
  - name: docker
config_providers:
  - name: docker
    polling: true
logs_config:
    container_collect_all: true
process_config:
  process_collection:
    enabled: true
apm_config:
  enabled: false # disable APM
site: us3.datadoghq.com

Not sure if it’s related but i see those errors:

mpous · March 20, 2024, 10:17am

Hello @zukoo i’m not sure if the Datadog blogpost is too old. Did you try this?

it’s old as well and then i’m checking the Pull Requests existing to the associated repo → Pull requests · balena-io-examples/balena-datadog · GitHub

What did you try?

zukoo · March 20, 2024, 12:20pm

@mpous I tried all three blogs above, for graphana i couldn’t make it work. And for datadog both the IOT and the normal client (i had to modify the dockerfile to get it to work and use the latest version) had the same issue.

This is an example on the IOT client where metrics work (even the docker ones) but no logs and no containers:

mpous · March 20, 2024, 12:34pm

@zukoo i never worked with Datadog! Maybe @kb2ma has some ideas?

on the other hand, what issues do you have with Grafana?

zukoo · March 20, 2024, 12:49pm

It’s not sending any data, the logs of the collector are filled with ‘Permanent error: Permanent error: Post "https://1480699:***@https//prometheus-prod-37-prod-ap-southeast-1.grafana.net/api/prom/push\": dial tcp: lookup https on 10.114.102.1:53: no such host’

EDIT: well I’m sorry after writing this i realize the issue with URL which shouldnt have the https, I’m trying without it now

zukoo · March 20, 2024, 1:18pm

Thanks @mpous, so my Grafana setup is now on par with the datadog one. I don’t see anything in the blog about logs. Do you recommend any way/exporter to upload the stdout/stderr of my containers to grafana?

zukoo · March 22, 2024, 5:47am

So i didn’t manage to get the agent to tail the containers output, but at least temporally i got the logs out to Datadog directly from my application code using: GitHub - DataDog/datadog-api-client-python: Python client for the Datadog API

mpous · March 22, 2024, 11:19am

That sounds good! let us know if this is your latest configuration!

Let us know if we can help you more!

philletourneau · April 20, 2024, 5:16pm

I’m heading down this path myself, I need to set something up. @zukoo any updates to your setup and testing?

zukoo · April 22, 2024, 1:24am

Hey @philletourneau,

Currently i use both IoT fleet monitoring with Datadog and balenaCloud: How small agent containers make a big impact - balena Blog to get host metrics, and the datadog SDK GitHub - DataDog/datadog-api-client-python: Python client for the Datadog API to stream logs.

If you find a way to use the agent to stream logs I’ll be interested to know how.

philletourneau · April 22, 2024, 2:31pm

Thanks for the update. I’d love to get logs and host metrics all-in-one too! I haven’t set anything up yet, but GrafanaCloud looks very tempting because of the pricing, though I don’t know if I can figure out how to set that up, they’re moving to something new now?

ada · May 17, 2024, 12:03pm

@philletourneau @zukoo We started using GCP Monitoring to collect both (logs and metrics), published by an OTEL collector: opentelemetry-collector-contrib/exporter/googlecloudexporter/README.md at main · open-telemetry/opentelemetry-collector-contrib · GitHub

So as long as you get stuff into the collector, everything ends up in a GCP dashboard.

@mpous To get logs and metrics up, having an OTEL collector as a service is obviously easy. For metrics you have to deal with each service, makes sense, it’s very specific. But what would be a tremendous simplification is to have Balena export the console logs of all services to the collector automatically. In that case we don’t have to instrumentalise each service independently. Our devices run 5 to 10 different services, and some are 3rd party.

mpous · May 17, 2024, 12:33pm

@ada this is a really interesting feedback!

Do you think you can introduce this in our public roadmap tool so our team can think about it?

Thanks

ada · May 17, 2024, 12:49pm

done: Publish all console logs to OpenTelemetry Collector · Balena Roadmap

philletourneau · May 17, 2024, 2:27pm

Thanks for additional ideas and feedback folks!

I just started experimenting with a new solution in beta, Pydantic Logfire | Uncomplicated observability so far it’s really great if you’re running Python services.

Topic		Replies	Views
Streaming logs with Datadog IoT Agent Product support support	2	357	August 26, 2023
IoT fleet monitoring with Datadog and balenaCloud: How small agent containers make a big impact Discussions	0	230	July 21, 2021
Getting Monitoring Set up with balena.io site examples is unusually difficult Product support	1	328	March 9, 2020
OpenTelemetry for IoT Metrics – balenaBlog Discussions	3	407	August 15, 2023
Enterprise device monitoring Product support	10	855	December 20, 2020

Observability solutions

Related topics