Forum Discussion

Gary_Dewrell's avatar
5 years ago

Alert Severity based on time in alert.

We monitor lots of remote customers systems. It is not uncommon for remote systems to go down for some period of time. 
I know I can use alert rules to adjust notification times. But what I would like to be able to do is setup some way for the alert severity to be automatically increased based on time in alert. 

For example if I am monitoring a server at a remote location and that server goes down. 

From start of alert to 30 minutes it would alert as a warning.

At 31 minutes it would automatically change to an error. 

At 46 minutes it would automatically change to a critical. 

 

4 Replies

  • This goes along with other requests previously to enable detection of thresholds over time (it may be possible to do something like this with the API, but writing dataources that use the API is a bit cumbersome).  Some resources show patterns where they go in and out of alert, resetting the counter, but if calculated over time you would clearly see a problem.  The upcoming alert capability for anomaly detection may help here, but no idea when it will be available or what features it will have.  Another related issue is events, which need correlation to know if they are actually ongoing, but it is not possible within LM currently.

  • Sarah_Terry's avatar
    Sarah_Terry
    Icon for Product Manager rankProduct Manager

    @Gary Dewrell @mnagel thanks for bringing this up! The second phase of Dynamic Thresholds (which we are currently working on) will use a combination of duration and percentage of values that exceed a given proximity from "normal", as calculated by anomaly detection, to set severity for alerts that are triggered. I think this will help with what you are asking for, and I can add you to the beta test list if you'd like to receive an invite to participate in beta testing. 

    In the meantime, your best bet is to set the Alert Trigger Interval (within the DataSource definition) to a large enough value that you can ensure that triggered alerts catch the uncommon issues and not routine events.

  • 16 hours ago, Sarah Terry said:

    @Gary Dewrell @mnagel thanks for bringing this up! The second phase of Dynamic Thresholds (which we are currently working on) will use a combination of duration and percentage of values that exceed a given proximity from "normal", as calculated by anomaly detection, to set severity for alerts that are triggered. I think this will help with what you are asking for, and I can add you to the beta test list if you'd like to receive an invite to participate in beta testing. 

    In the meantime, your best bet is to set the Alert Trigger Interval (within the DataSource definition) to a large enough value that you can ensure that triggered alerts catch the uncommon issues and not routine events.

    Yes please add us to the Beta test list. Thank you. 

     

  • Me too, perhaps in sandbox only unless it is not likely to cause any issues in production.

    Thanks,
    Mark