Forum Discussion

Jamie's avatar
4 years ago

Dynamic Instance Group Alert Tuning

This is not an advertisement by any means, just offering to help anyone who struggles with this as well.

As an MSP, we have struggled with how to handle alert tuning in bulk with it comes to things like Interfaces (instances).  Some of the interfaces you want to alarm as critical, some you want as error and others you don't care about at all.  LM provided a partial fix for that with their Groovy based "Status" alarm based on the interface description, but it didn't take it far enough.  We started creating manual interface groups called "Critical" and performing Alert Tuning on that "parent" only to find out that it doesn't work as interfaces move in and out of it.  I was beyond disappointed, but it said it right at the top of the page: Changes made to Alerting or Thresholds will only affect existing instances currently in this Instance Group. Instances added later will not be subject to the changes.

Anyway, long story short we finally decided to write our own application to do it and built it in Azure.  We built it to handle multiple data sources so we could group other instances (like VMware vDisks) and do the same bulk changes.  It was written to be a data source in your environment, so that you can apply it to whatever devices you want and just call out to the API with the device name.  If you have any interest in using it, let me know.  There are costs associated as Azure bills based on usage, but it is pretty small for us (< $200/mo).

Trust me, I wish LM solved this without having to write the app!

1 Reply

  • Sounds great, will check it out! I wrote something similar for interfaces way back for similar reasons -- LM limitations and no clear focus on nuts and bolts stuff like this.  My largest remaining annoyance on that is we use patterns via properties that examine the interface description so we can disable monitor and/or alert for interfaces based on description patterns (e.g., "Workstations").  However, because AD will skip interfaces that are operDown you cannot change the description after it is down and have the script make the necessary change. You could change the DS to discover operDown interfaces always, but that would be ridiculous in general.  I filed a F/R to allow updating some fields (like the description) for instances already discovered, but AFAIK it has received no attention.  Just added to our repo under GPL: