Forum Discussion
Here was my solution:
Add Custom Property to each device with the cluster ID (or an arbitrary GUID/unique naming convention instance). We used HyperV.Cluster.GUID = <GUID generated by powershell with New-GUID>
Use that to build a dynamic group for each cluster
Use that group to allow for cluster alert for which ever sources you're trying to get visibility into, then tell it not to trigger those source alerts for the individual members of the group.
To help reduce the number of repeated emails generated by the escalation chains, add a blank entry to the end of the escalation chain you're using. The escalation increments through the steps after the escalation time has passed, then repeats the last step once it gets to the end. That can produce far more emailed alerts than are necessary. Adding a blank last step causes the escalation to repeat that after making it through the chain. You can add timing pauses into your escalation chain as well by adding blank steps between. If the alert closes during that time, the rule ends and the following steps don't fire.
For instance, in Dynamics AX, the AOS servers can take a long time to come up. Our team needs to know that it's gone down. Our customer needs to know if it doesn't com up. Rather than making a pair of sources that we can then create separate alerts for, We make the first step the contact to our team, then enough blanks to account for the time it normally takes for the service to restart. After that, we add the customer's team, then a blank.
Our current ticketing system only accepts automated tickets via email. These little tricks allow us to have tickets only generated once per alert, and allow us to make sure we're not panicking our customers for failovers / failures that are recovering as expected.
Related Content
- 2 years ago
- 2 years ago