Balena and Telegraf - Bad Performance in inputs

Hi, I have been using Balena and Telegraf for quite some time and I see that after a few hours of my deployment running, the performance gets worse and it only gets better after a reboot

My DOCKERFILE looks like this:

FROM telegraf:latest

RUN apt-get update && apt-get install -y --no-install-recommends dnsutils mtr git iperf3 telnet tcpdump traceroute wvdial usb-modeswitch ppp nano vim lftp nmap cron && \
    rm -rf /var/lib/apt/lists/*

COPY ./telegraf.conf /etc/telegraf/telegraf.conf
COPY ./ /etc/telegraf
COPY ./ /usr/local/bin/
RUN chmod 775 /usr/local/bin/

After the reboot, tests look like this

but, at 19:12 they start to look worse and my nping stops working:

you can clearly see the problem.

Does anyone know what I can be doing wrong? or why it gets fixed when I reboot the rasp? also it will help if I can schedule a reboot at any time like with crontab, option that I think is not available in Balena. Please help

Has this started happening recently? With newer iterations of balenaOS?
I don’t have a lot of experience working with Telegraf. To help you with scheduling reboots, we do have a cron block now that you can use to schedule that reboot balenaHub: an easier way to find and publish fleets, apps, and blocks for edge devices

What are you pinging, and how long, how frequent?
Could it be that after an amount of request the other server treats your pings as spam and replies differently?

By the behavior itself (fails after time, works after reboot) my first guess would be that logs or other data that is needed for the functionality fills up the storage or RAM and causes it to malfunction.

@vipulgupta2048 's suggestion of the cron block could be a quick workaround, or since you install cron in your dockerfile already you can configure that to call the supervisor to restart.

Hi there, in addition, could you also enable support access for this device? I can look at the device logs to see what happened at 19:12. However if the device has been rebooted since then, these logs will be lost - so let me know i thats the case.

well, I do not have this problem when I use rasp os (they have the same telegraf config). I have around 60 sensors (like 20 work with balena in two fleets). The whole deployment involves: dns queries, http responses, speedtests, MTR, ping, etc.

I am pinging around 100 sites, 5 counts, 0.25 intervals every 3 min. All this data is dumped to the database every 20 min.

This is in production already and I have to reboot the devices myself. I am checking the health and behavior of a Telco HFC, Remote PHY, FTTH network.

I have tried to restart the device by calling the supervisor:

$ curl -X POST --header "Content-Type:application/json" \ "$BALENA_SUPERVISOR_ADDRESS/v1/reboot?apikey=$BALENA_SUPERVISOR_API_KEY"

but you have to understand that I will need to echo each api key for each sensor. On top of that, I cannot make it work. For some weird reason I can GET info with the command but I cannot POST anything.

The devices were rebooted because I lost tests at 3 am again. The whole project is in production.

I can enable support access to my devices. should I share the fleet and uuid via support chat?

right now I have a sensor that is with 100% CPU usage and when I try to enter terminal this is what I get:

I can grant access to this sensor as well.

Hi @7ser23,

We just thought to reach out again and see if you’d be willing to enable support access so we can review device logs with you. Let us know if that’s possible. Thank you!

Hi, I was able to fix the problems doing the following:

I added TINI in my dockerfile to kill zombie processes and make it run in PID 1. It fixed the issue but it had problems with my inputs.exec, because of this we checked telegraf documentation and found:

We added this line USER telegraf to our dockerfile and we are not seeing any problems now.

The dockerfile looks like this now:

FROM telegraf:latest
ADD${TINI_VERSION}/tini-static-arm64 /tini
RUN chmod +x /tini
ENTRYPOINT [“/tini”, “–”]
RUN apt-get update && apt-get install -y --no-install-recommends dnsutils mtr git iperf3 telnet tcpdump traceroute nano vim lftp nmap cron &&
rm -rf /var/lib/apt/lists/*
COPY ./telegraf.conf /etc/telegraf/telegraf.conf
COPY ./ /usr/local/bin/
RUN chmod 775 /usr/local/bin/
USER telegraf
CMD telegraf

Great, thank you for reporting back with your solution! We’ll make sure this is logged so future users might benefit. All the best with it.