Notifications for production errors?

Hi again @mpark,

If you have the development bandwidth, it is always good practice to set up monitoring for your services & devices. Some common stacks include telegraf/TICK stack or 2, Prometheus, or Datadog. I hope to publish an updated Prometheus guide soon, so stay tuned for that as well. Moreover, something like a log forwarding service can be useful if you have a robust logging setup, though I find pattern matching in logs to be a little brittle for arbitrary errors in production.

Additionally, there are some things you can do on-device to make your application more resilient to failure. We always recommend configuring a HEALTHCHECK in your Dockerfile, and making sure you have tested some common failure cases for your app.

Again, we are working on many of these problems now internally, so please let us know what issues you run into or what would make your life as a fleet owner easier!

1 Like