Alert Severity based on time in alert.

Question

We monitor lots of remote customers systems. It is not uncommon for remote systems to go down for some period of time.&nbsp;
	I know I can use alert rules to adjust notification times. But what I would like to be able to do is setup some way for the alert severity to be automatically increased based on time in alert.&nbsp;

For example if I am monitoring a server at a remote location and that server goes down.&nbsp;

From start of alert to 30 minutes it would alert as a warning.

At 31 minutes it would automatically change to an error.&nbsp;

At 46 minutes it would automatically change to a critical.&nbsp;

&nbsp;

mnagel · Answer

This goes along with other requests previously to enable detection of thresholds over time (it may be possible to do something like this with the API, but writing dataources that use the API is a bit cumbersome).&nbsp; Some resources show patterns where they go in and out of alert, resetting the counter, but if calculated over time you would clearly see a problem.&nbsp; The upcoming alert capability for anomaly detection may help here, but no idea when it will be available or what features it will have.&nbsp; Another related issue is events, which need correlation to know if they are actually ongoing, but it is not possible within LM currently.

sarah_terry · Answer

@Gary Dewrell&nbsp;@mnagel&nbsp;thanks for bringing this up! The second phase of Dynamic Thresholds (which we are currently working on) will use a combination of duration and percentage of values that exceed a given proximity from "normal", as calculated by anomaly detection, to set severity for alerts that are triggered. I think this will help with what you are asking for, and I can add you to the beta test list if you'd like to receive an invite to participate in beta testing.&nbsp;

In the meantime, your best bet is to set the Alert Trigger Interval (within the DataSource definition) to a large enough value that you can ensure that triggered alerts catch the uncommon issues and not routine events.

gary_dewrell · Answer

16 hours ago, Sarah Terry said:

@Gary Dewrell&nbsp;@mnagel&nbsp;thanks for bringing this up! The second phase of Dynamic Thresholds (which we are currently working on) will use a combination of duration and percentage of values that exceed a given proximity from "normal", as calculated by anomaly detection, to set severity for alerts that are triggered. I think this will help with what you are asking for, and I can add you to the beta test list if you'd like to receive an invite to participate in beta testing.&nbsp;

In the meantime, your best bet is to set the Alert Trigger Interval (within the DataSource definition) to a large enough value that you can ensure that triggered alerts catch the uncommon issues and not routine events.

Yes please add us to the Beta test list. Thank you.&nbsp;

&nbsp;

mnagel · Answer

Me too, perhaps in sandbox only unless it is not likely to cause any issues in production.

Thanks,
	Mark

Forum Discussion

Alert Severity based on time in alert.

4 Replies

Recent Discussions

Read-Only Non-SSO Access for Dashboard Wallboards

NOC Widgets - Support for LM Logs alerts

New user initial landing page

LM Logs alerts and SDT

Locate a device by MAC address