Balena and Telegraf - Container not running

Hi all, I am quite new to Docker, Balena however I am quite knowledgeable in Telegraf and InfluxDB.

I have a deployment of 100 raspberry PI in my network.

I want to use balena for management. Before I plan on buying the license I want to make it work with less than 10 devices.

I created the following docker config

#FROM balenalib/rpi-raspbian
FROM balenalib/raspberry-pi-debian:latest
# replace this with your application
RUN install_packages wget git

RUN curl -sL https://repos.influxdata.com/influxdb.key | sudo apt-key add -
RUN echo "deb https://repos.influxdata.com/debian buster stable" | sudo tee /etc/apt/sources.list.d/influxdb.list
RUN sudo apt-get update
RUN sudo apt-get install -y telegraf

COPY telegraf.conf /etc/telegraf/telegraf.conf
ENV UDEV=1

#RUN sudo telegraf service start

CMD ["service", "telegraf", "start"]

I then do a balena push to my fleet and the image is uploaded successfully.

The problem I get is that I get no data to my influx and obviously no information in my grafana.

the logs in my balena dashboard shows the following:

acd39f9724d53f667720c3b0667f’
main Starting the process telegraf [ OK ]
main telegraf process was started [ OK ]
Service exited ‘main sha256:4e0a76e0b56598c253724a913baaa0a98e61acd39f9724d53f667720c3b0667f’

now in my terminal I have 2 tabs, HostOS and Main. I can access HostOS but I cannot access main since I got this error:

Error response from daemon: Container c25055ab4b5ef3938a8c0f873b20baf25512e4cf81717eee2da6df3ed399b87c is restarting, wait until the container is running

I know I am missing something since my container is not running but I do not understand what.

Please help

Hello @7ser23 welcome to the balena community!

It looks like your container gets stopped and not restarted. Could you please share your telegraf.conf to understand if there is any issue from there?

Having said this, did you check the balenaBlocks? We have the Connector block which is running Telegraf, so you don’t need to deal with this. If you also want to run Grafana you can run the Dashboard block visualizing data coming from Telegraf.

Find here an example running Telegraf + InfluxDB + Grafana → balena-sense/docker-compose.yml at master · balenalabs/balena-sense · GitHub

Let me know if that works :slight_smile:

sure. Here is my Telegraf.conf (unfortunately I could not upload it) so I am copying only the first lines and not the inputs. FYI, the conf file works fine in my original deployment.

I made some changes in the user, password and database URL for security reasons. and everything else is commented until my output plugins.

I am checking the links you shared, but like I said I am quite new to Docker and Balena so it takes me a while to understand it since I need to google several things :slight_smile:

I will also be waiting for your reply.

I really appreciate your help

# Telegraf Configuration
#
# Telegraf is entirely plugin driven. All metrics are gathered from the
# declared inputs, and sent to the declared outputs.
#
# Plugins must be declared in here to be active.
# To deactivate a plugin, comment out the name and any variables.
#
# Use 'telegraf -config telegraf.conf -test' to see what metrics a config
# file would generate.
#
# Environment variables can be used anywhere in this config file, simply surround
# them with ${}. For strings the variable must be within quotes (ie, "${STR_VAR}"),
# for numbers and booleans they should be plain (ie, ${INT_VAR}, ${BOOL_VAR})


# Global tags can be specified here in key="value" format.
[global_tags]
  # dc = "us-east-1" # will tag all metrics with dc=us-east-1
  # rack = "1a"
  ## Environment variables can be used as tags, and throughout the config file
  # user = "$USER"


# Configuration for telegraf agent
[agent]
  ## Default data collection interval for all inputs
  interval = "300s"
  ## Rounds collection interval to 'interval'
  ## ie, if interval="10s" then always collect on :00, :10, :20, etc.
  round_interval = true

  ## Telegraf will send metrics to outputs in batches of at most
  ## metric_batch_size metrics.
  ## This controls the size of writes that Telegraf sends to output plugins.
  metric_batch_size = 1000

  ## Maximum number of unwritten metrics per output.  Increasing this value
  ## allows for longer periods of output downtime without dropping metrics at the
  ## cost of higher maximum memory usage.
  metric_buffer_limit = 10000

  ## Collection jitter is used to jitter the collection by a random amount.
  ## Each plugin will sleep for a random time within jitter before collecting.
  ## This can be used to avoid many plugins querying things like sysfs at the
  ## same time, which can have a measurable effect on the system.
  collection_jitter = "180s"

  ## Default flushing interval for all outputs. Maximum flush_interval will be
  ## flush_interval + flush_jitter
  flush_interval = "500s"
  ## Jitter the flush interval by a random amount. This is primarily to avoid
  ## large write spikes for users running a large number of telegraf instances.
  ## ie, a jitter of 5s and interval 10s means flushes will happen every 10-15s
  flush_jitter = "50s"

  ## By default or when set to "0s", precision will be set to the same
  ## timestamp order as the collection interval, with the maximum being 1s.
  ##   ie, when interval = "10s", precision will be "1s"
  ##       when interval = "250ms", precision will be "1ms"
  ## Precision will NOT be used for service inputs. It is up to each individual
  ## service input to set the timestamp at the appropriate precision.
  ## Valid time units are "ns", "us" (or "µs"), "ms", "s".
  precision = "0s"

  ## Log at debug level.
  debug = true
  ## Log only error level messages.
  # quiet = false

  ## Log target controls the destination for logs and can be one of "file",
  ## "stderr" or, on Windows, "eventlog".  When set to "file", the output file
  ## is determined by the "logfile" setting.
  logtarget = "file"

  ## Name of the file to be logged to when using the "file" logtarget.  If set to
  ## the empty string then logs are written to stderr.
  logfile = "/var/log/telegraf/telegraf.log"

  ## The logfile will be rotated after the time interval specified.  When set
  ## to 0 no time based rotation is performed.  Logs are rotated only when
  ## written to, if there is no log activity rotation may be delayed.
  # logfile_rotation_interval = "0d"

  ## The logfile will be rotated when it becomes larger than the specified
  ## size.  When set to 0 no size based rotation is performed.
  logfile_rotation_max_size = "100MB"

  ## Maximum number of rotated archives to keep, any older logs are deleted.
  ## If set to -1, no archives are removed.
  logfile_rotation_max_archives = 5

  ## Override default hostname, if empty use os.Hostname()
  hostname = ""
  ## If set to true, do no set the "host" tag in the telegraf agent.
  omit_hostname = false




###############################################################################
#                            OUTPUT PLUGINS                                   #
###############################################################################


# Configuration for sending metrics to InfluxDB
[[outputs.influxdb]]
  ## The full HTTP or UDP URL for your InfluxDB instance.
  ##
  ## Multiple URLs can be specified for a single cluster, only ONE of the
  ## urls will be written to each interval.
  # urls = ["unix:///var/run/influxdb.sock"]
  # urls = ["udp://127.0.0.1:8089"]
  urls = ["change the URL"]

 ## The target database for metrics; will be created as needed.
  ## For UDP url endpoint database needs to be configured on server side.
  database = "sondas"

  ## The value of this tag will be used to determine the database.  If this
  ## tag is not set the 'database' option is used as the default.
  # database_tag = ""

  ## If true, the 'database_tag' will not be included in the written metric.
  # exclude_database_tag = false

  ## If true, no CREATE DATABASE queries will be sent.  Set to true when using
  ## Telegraf with a user without permissions to create databases or when the
  ## database already exists.
  skip_database_creation = true

  ## Name of existing retention policy to write to.  Empty string writes to
  ## the default retention policy.  Only takes effect when using HTTP.
  # retention_policy = ""

  ## The value of this tag will be used to determine the retention policy.  If this
  ## tag is not set the 'retention_policy' option is used as the default.
  # retention_policy_tag = ""

  ## If true, the 'retention_policy_tag' will not be included in the written metric.
  # exclude_retention_policy_tag = false

  ## Write consistency (clusters only), can be: "any", "one", "quorum", "all".
  ## Only takes effect when using HTTP.
  # write_consistency = "any"

  ## Timeout for HTTP messages.
  # timeout = "5s"

  ## HTTP Basic Auth
  username = "ERASED"
  password = "PRIVATE"

@7ser23 if you are new! i would strongly recommend to use the balenaBlocks with the example that i shared before you can have the 3 services running (telegraf + influxdb + grafana) in few minutes on a fleet and start pushing data.

BTW i didn’t see any [inputs] on your telegraf.conf. As a curiosity how are you going to send the data? You are storing data on influxDB and then sending them to grafana over HTTP?

I did not put any inputs. I am sharing a snippet since I have like 200 inputs between iperfs, http_response, ping, mtr, etc

The thing is that I only need to run Telegraf, I already have my Grafana, InfluxDB configured and running with 100 sensors already. I just need to send the data to my DB since Grafana is already reading the data from Influx.

# # Ping given url(s) and return statistics

################## Puerta de Enlace ###############3
[[inputs.ping]]
  urls = ["192.168.0.1"]
  interval = "90s"
  count = 10
  [inputs.ping.tags]
      name = "Puerta de Enlace"

#################### International Web ####################
[[inputs.ping]]
  urls = ["www.facebook.com"]
  interval = "90s"
  count = 10
  [inputs.ping.tags]
      name = "www.facebook.com"
      type = "ExCliente"

[[inputs.ping]]
  urls = ["www.google.com"]
  interval = "90s"
  count = 10
  [inputs.ping.tags]
      name = "www.google.com"
      type = "ExCliente"
[[inputs.ping]]
  urls = ["www.apple.com"]
  interval = "90s"
  count = 10
  [inputs.ping.tags]
      name = "www.apple.com"
      type = "ExCliente"
[[inputs.ping]]
  urls = ["www.riotgames.com"]
  interval = "90s"
  count = 10
  [inputs.ping.tags]
      name = "www.riotgames.com"
      type = "ExCliente"
[[inputs.exec]]
commands = ["mtr -C -n www.facebook.com",
  "mtr -C -n www.google.com",
  "mtr -C -n www.apple.com",
  "mtr -C -n www.riotgames.com",
  "mtr -C -n www.amazon.com",
  "mtr -C -n www.twitter.com",
  "mtr -C -n www.youtube.com",   
  
  "mtr -C -n 68.142.113.32",
  "mtr -C -n 13.249.105.79",
  "mtr -C -n 172.217.192.128",
  
  "mtr -C -n 64.233.190.128",
  "mtr -C -n 64.233.186.128",
  "mtr -C -n 13.226.50.95",
  "mtr -C -n 13.226.50.53",
  "mtr -C -n 13.226.50.5",
  "mtr -C -n 13.226.50.123",

.....

and I am sharing an image of my grafana

@7ser23 are you running the project without docker-compose?

On the other hand, the only project i know from the community running directly Telegraf is this one → GitHub - hferentschik/balena-weather: Balena weather station

let me know if that helps!

Yes, I install Telegraf and all my necessary dependencies directly in my raspberries. I deploy them all around my country and use anydesk and crontab to manage them.

I want to get rid of anydesk and create my own images in balena, so that our engineers can download it, burn it via Etcher and just connect it to the internet.

I also want to use balena to create different fleets so I can use different conf files according to my needs.

I plan to expand my network to 150 to 200 sensors in 2022, but I need to have this running first with < 10 sensors. Managing 100 sensors the way I do it is not good, that is why we need balena, demo it and approve the purchase.

And yes, I saw that project several times but, I still do not understand it completely. From my point of view, what I am doing here:

#FROM balenalib/rpi-raspbian
FROM balenalib/raspberry-pi-debian:latest
# replace this with your application
RUN install_packages wget git

RUN curl -sL https://repos.influxdata.com/influxdb.key | sudo apt-key add -
RUN echo "deb https://repos.influxdata.com/debian buster stable" | sudo tee /etc/apt/sources.list.d/influxdb.list
RUN sudo apt-get update
RUN sudo apt-get install -y telegraf

COPY telegraf.conf /etc/telegraf/telegraf.conf
ENV UDEV=1

#RUN sudo telegraf service start

CMD ["service", "telegraf", "start"]

should work, it is basically a copy and paste from the GET STARTED tutorial in Balena for rasp4. I know telegraf is installed, I know it is running, I just don’t understand why the container stops

Hi all and specially MPOUS, I was able to make it work.

I added the following line in my docker :slight_smile:

CMD ["balena-idle"]

Now I am trying to change the name so that I can get my grafana with the correct entry .

I think we can check this as solved. Thank you for the help

1 Like

Thanks @7ser23 for the confirmation!

Let me know if we can help you more :slight_smile: