Forum Discussion

grantae's avatar
4 years ago

Alerts that monitor other alerts

Example: I have one router connected to the network's other router with 2 links (interfaces, tunnels, etc). If one of the links goes down the normal alert rule to email me is fine. However, if BOTH links go down I want a page. 

Cluster alerts was close to what I needed but it seemed to only be able to be set for if ANY 2 links go down then do this, instead of if these 2 links go down. I care about the relation between 2 specific links on a device, not the other ports going to random servers and stuff happening to go down. (I have different alerts for those.)

Has any one dealt with an issue similar to this and found a work around/solution? Maybe an eventsource (or something) would be able to check for if Alert A and Alert B exist at the same time?

3 Replies

  • Anonymous's avatar
    Anonymous

    You could create a service out of those two links. The service metric would be interface status. You would choose to aggregate the status data by "mean". If both links are up, they'd both return 1, so the average would be 1. If one link is down, you'd get the average of 1 and 2 (1.5). If both links are down, you'd average 2 and 2 (2). Set your threshold to >=2 and you should be good to go.

    The only tedious part is setting this up for each pair of links you have.

  • Sure, you can use Service Insight for this, but it is a premium feature, which is using an expensive mallet to handle something that should be available without that extra cost.  Or, there should be a Service Insight light for this stuff, leaving the costly part for the intended enhanced features of Service Insight (like Kubernetes).

    My recommendation on this was to extend cluster alerts so you could at least match up instances.  My use case at the time was to detect an AP offline on a controller cluster.  There is no way to do this without SI, which as you say is complex, and it is an extra cost.  We need stuff like this in the base product.

  • 1 hour ago, Stuart Weenig said:

    You could create a service out of those two links. The service metric would be interface status. You would choose to aggregate the status data by "mean". If both links are up, they'd both return 1, so the average would be 1. If one link is down, you'd get the average of 1 and 2 (1.5). If both links are down, you'd average 2 and 2 (2). Set your threshold to >=2 and you should be good to go.

    The only tedious part is setting this up for each pair of links you have.

     

    This sounds perfect! Thank you for the suggestion!

    I put the 2 instances I want to monitor together and made a rule for it. Just need to test it and see if it works as intended.