Alerts if there are X failures over Y time

Question

It is currently impossible to detect certain conditions without having to be bombarded by noise alerts, which I am told is against the philosophy of Logic Monitor. &nbsp;Consider a few cases:

&nbsp; &nbsp; * interface flaps a few times versus more frequently -- how do you tell the difference? &nbsp;right now, you have no choice other than perhaps to construct an API script (not tested). &nbsp;A better solution in this example would be to count the number of flaps over a period of time, and use that as your alert trigger. &nbsp;As it stands right now, there is not even a method to select the top 10 most unstable interfaces since it is literally a yes or no value and top 10 makes no sense.

&nbsp; &nbsp; * resource utilization (bandwidth, CPU etc.) &nbsp;is sometimes much better checked over a period of time than just a single interval. &nbsp;the answer I have received on that is "require N checks to fail", and this works if the resource is pegged, but not if it is spiky. &nbsp;As it stands now, the longer of a period you want to simulate via "N checks", the higher the chance one check will reset the alert but the overall result is clearly bad on inspection.

Please note this problem has been solved long ago by other tools, like Zabbix (https://www.zabbix.com/documentation/3.4/manual/config/triggers/expression), so hopefully this can be added to LM in the near future as well.

nrichards · Answer

I second this request. &nbsp;The ability to incorporate a time-based / duration based metric for datasources such as CPU / Memory usage (especially) would be really useful. &nbsp;We would like to be able to implement this for scenarios such as: &nbsp;If&nbsp;Device A breaches the configured CPU threshold for more than 1 hour, generate an alert. &nbsp;If it breaches for less than an hour, do nothing.

We have different application teams, that would want the duration to be customisable, in line with their applications behaviour.

Is this something that is in the Development teams backlog?

mosh · Answer

I also need this. &nbsp;I'd like to be able to have it trigger if the value after&nbsp;n consecutive polls breaches a&nbsp;threshold which is the average of the values from the past day/week/month, given a sampling interval over the selected period. &nbsp;So for example, is CPU% greater than the average CPU% from the past 7 days readings, at 1 hour intervals.

eugene_c · Answer

This feature is needed. I have the same scenarios as it was described in previous posts.

matt_gauthier · Answer

upvote!

paul_armenakis1 · Answer

Agreed, this would be nice to have.

Forum Discussion

Alerts if there are X failures over Y time

8 Replies

Recent Discussions

Dashboard Sharing – An Inline Framing Method

2021-12-15 US Office Hours

Live Training - Tuning Datapoints and Alerts - 15th JUNE 2022 - APAC

Live Training - Introduction to Dashboards - 18th MAY 2022 - APAC

2022-05-11- APAC Product Overview -Collectors, Resources/Groups, Dashboards