Hey @Tristan107 @imrehg,
I’ve been looking at monitoring again lately and while we may add some alerting to device status etc it’s more likely we will focus on making resin integrate well with existing monitoring solutions like datadog
, prometheus
, TICK stack
etc.
My current recommendation is to use TICK stack, it’s a culmination of the influxdata’s open source projects, namely telegraf, influxdb, kapacitor and Chronograph. The projects work well together and are free to use, many SaaS
products pricing models are based on cloud infrastructure which makes them way to expensive for IoT usecase. This may change soon.
Brief explanation on how TICK stack works:
-
telegraf
is the agent that will run on the device, it has a series of input and output plugins, making it really easy collect data from anything (machine metrics
,statsd
,cadvisor
,docker
, or pushing stats directly from you app. It’s output plugins allow you to use the push or the pull model. In our previous we used prometheus which relies on the pull model, in that model you have to know the targets IP address and therefore use the resin-sdk to get those addresses and scrape their metrics endpoint. But using telegraf we can easily switch to a push model, making the stack a lot simpler. -
The cloud portion will be a influxdb a timeseries database, Kapacitor which allows you to set alerts on queries, and chronograph which allows you to create graphs to visualise the queries.
We are still missing a few key pieces of functionality to make monitoring 1st class.
-
In most cases you’d want to monitor host metrics, which requires mounting specific volumes from the host, this is something resin doesn’t currently support.
-
In many cases you’d want to monitor docker stats, in this case you’d need access to the docker socket, this is something resin doesn’t support in production mode.
-
Multi-container is needed to make monitoring solutions a drop in replacement.
Here’s a screenshot from Chronograf: I’m using resin dev-image so I can monitor Docker, it’s pretty looks slick.
I’ll probably put together another blog post once we have a clearer idea on everything above.