Hi, I have been using Balena and Telegraf for quite some time and I see that after a few hours of my deployment running, the performance gets worse and it only gets better after a reboot
Does anyone know what I can be doing wrong? or why it gets fixed when I reboot the rasp? also it will help if I can schedule a reboot at any time like with crontab, option that I think is not available in Balena. Please help
What are you pinging, and how long, how frequent?
Could it be that after an amount of request the other server treats your pings as spam and replies differently?
By the behavior itself (fails after time, works after reboot) my first guess would be that logs or other data that is needed for the functionality fills up the storage or RAM and causes it to malfunction.
@vipulgupta2048 's suggestion of the cron block could be a quick workaround, or since you install cron in your dockerfile already you can configure that to call the supervisor to restart.
Hi there, in addition, could you also enable support access for this device? I can look at the device logs to see what happened at 19:12. However if the device has been rebooted since then, these logs will be lost - so let me know i thats the case.
well, I do not have this problem when I use rasp os (they have the same telegraf config). I have around 60 sensors (like 20 work with balena in two fleets). The whole deployment involves: dns queries, http responses, speedtests, MTR, ping, etc.
I am pinging around 100 sites, 5 counts, 0.25 intervals every 3 min. All this data is dumped to the database every 20 min.
This is in production already and I have to reboot the devices myself. I am checking the health and behavior of a Telco HFC, Remote PHY, FTTH network.
I have tried to restart the device by calling the supervisor:
$ curl -X POST --header "Content-Type:application/json" \ "$BALENA_SUPERVISOR_ADDRESS/v1/reboot?apikey=$BALENA_SUPERVISOR_API_KEY"
but you have to understand that I will need to echo each api key for each sensor. On top of that, I cannot make it work. For some weird reason I can GET info with the command but I cannot POST anything.
We just thought to reach out again and see if you’d be willing to enable support access so we can review device logs with you. Let us know if that’s possible. Thank you!
Hi, I was able to fix the problems doing the following:
I added TINI in my dockerfile to kill zombie processes and make it run in PID 1. It fixed the issue but it had problems with my inputs.exec, because of this we checked telegraf documentation and found: