22 July 2020 - Dynamic Thresholds

Anonymous

6 years ago

Q&A Transcript:

Q: How can the Dynamic Threshold minimize false positive deltas caused by
-Memory Utilization of a system over time [normal increases in utilization vs memory leakage] and sudden increases in memory utilization after a firmware/OS upgrade that vary widely from the normalized trend/reference point of utilization.
-CPU jumps caused by turning on features and services, or upgrades in a system
-Interface utilization jumps caused by our customers' backups overnight.
A: Touching on how this is done over the next slide or so, please let us know if it's not clear after and we can touch on this with more detail/specific context. Live answered at 17:33.

Q: Would Dynamic thresholds also have Seasonality Buckets that are week, 2 week month long? like a client having a VERY busy Christmas week or VERY busy mid Jan to mid Feb
A: Live answered at 32:58
Followup pending from Chris in Product management

Q: Is this a licensed feature or included with what we already have?
A: Dynamic thresholds are available to LogicMonitor Enterprise accounts

Q: What happens then if that's a pattern? Lets say we've clients doing backups during the weekend (which fills the disks temporarily over the weekend, or an interface that gets a huge usage over the weekend)... Would AI learn that behavior & expect it over the weekend?

A: Yes that should fall under the “seasonality” data training - “Daily and weekly trends also factor into dynamic threshold calculations. For example, a load balancer with high traffic volumes Monday through Friday, but significantly decreased volumes on Saturdays and Sundays, will have expected data ranges that adjust accordingly between the workweek and weekends.”
https://www.logicmonitor.com/support/alerts/aiops-features-for-alerting/enabling-dynamic-thresholds-for-datapoints

Q: Can we specify/Configure Dynamic Thresholds by Device Groups? For example when a Group is a Client company for MSP
A: At the moment I believe they can only be configured at the Global, or Instance level. Each resource would determine it’s own bounds for “normal” data. https://www.logicmonitor.com/support/alerts/aiops-features-for-alerting/enabling-dynamic-thresholds-for-datapointsChris Sternberg has informed me that device-group dynamic threshold configuration is planned for release in the near future.

Q: Our portal does not have Dynamic Thresholds in the Datapoint config pages. What is needed to enable this?
A: Dynamic Thresholds are available to LogicMonitor Enterprise, if you DM me your portal name I can check your subscription level if you’re unsure of it. https://www.logicmonitor.com/support/alerts/aiops-features-for-alerting/enabling-dynamic-thresholds-for-datapoints

Q: When setting the dynamic thresholds, does the alert come across to the resolver groups clearly listing that this is delivered as a dynamic alert vs a static? Or would we need to add a value to be traded with the API, such as service now, etc? Thanks!
A: The alert description will include a note that it is triggered via a dynamic threshold. Check out the Viewing Alerts for Dynamic Thresholds section here for some more detail. https://www.logicmonitor.com/support/alerts/aiops-features-for-alerting/enabling-dynamic-thresholds-for-datapoints

Q: How often is the "normal band" recalculated? I.E. if a disk is filling up VERY slowly, 1.5 of "normal" might not ever be triggered, depending on when (if) normal is recalculated occasionally.
A: Live answered at 35:24.

Q: Thanks!
A: Glad that we can help! Please feel free to reach out to our support team any time if you have additional questions.

Q: Using reports was mentioned on how to tune- would that be done in the same way as static thresholds? I.e. turn thresholds on and run alert reports to view number of alerts?
A: Live answered at 37:30.

Q: Do static thresholds remain in place when dynamic ones are enabled? Use case: I've set a 75% POE utilization for a switch, and I would still want to know if that is exceeded but would the dynamic threshold be able to operate concurrently and trip an alert if the utilization dropped to 0?
A: Yes, with the initial release of Dynamic Thresholds, alerts still trigger at static thresholds but will suppress the notification from being sent unless the data is determined to be anomalousThere’s a more in depth explanation under “Assigning both Static and Dynamic Thresholds to a Datapoint” here https://www.logicmonitor.com/support/alerts/aiops-features-for-alerting/enabling-dynamic-thresholds-for-datapoints

Q: To follow up on the baseline creep question, how often is normal band recalculated. Is it continually, or daily, or weekly, etc. Maybe I missed (or misunderstood) the explanation?
A: Live answered at 39:20.

Q: How do we determine the number of alerts that were suppressed by the dynamic threshold historically, the data does not get captured in the alerts report
A: Live answered at 40:41.

Q: Thanks. This is a problematic situation b/c I need to know if the top value is exceeded, but I simultaneously need to know if the value drops to 0 (this consistently indicates POE HW failure in the switch) but I haven't figured out how to accomplish this duality with existing static thresholds, considering that not all switches deliver POE normally--0 is the expected value for them. hope that explanation is understandable
A: Live answered at 42:53.

Q: You mentioned that v2 learns faster. How much faster? Before it needed 3 days to figure out what is normal.
A: Live answered at 42:23.

Q: For the alert frequency report where do I find it - it does not appear in my Start a new report. Can you show us on screen perhaps?
A: Live answered at 44:36.

Q: As an MSP, we've had Dynamic Thresholds enabled on thousands of endpoints across various datapoints since Phase One, which worked very well and as expected – basically for tuning alerts out on datasets that were not considered anomalous. As Phase Two of Dynamic Thresholds went live, we encountered a large increase in alerts thrown that were still considered anomalies based off historical data, but well underneath our previously configured static thresholds. The “upper bound” selection was selected by default on this move from Phase One to Phase Two.
Question is – is there a best practice for handling these previous datapoints that had dynamic thresholds moved from Phase One to Phase Two?
A: Live answered at 46:16.
Followup pending from Chris in Product management

Q: Thanks. Do we need to enable the dynamic thresholds first before being able to see the offsets options?
A: Those options should be available once the data has trained (e.g. after enabling them) there is a Dynamic Threshold Advanced Config option.

Q: How long would a datapoint have to be at an “abnormal” level before it would become the new “normal” level? Would an existing alert clear when this happens?
A: Live answered at 48:09.

Q: Followup: What determines "as little as" 13 hours?
A: Live answered at 48:55

Q: Does it look at Historical data for the instance...say we just turn it on
A: Live answered at 51:21

Q: These are very helpful and find them valuable. Continue these webinars...
A: We’re glad you enjoy them! Thanks for joining us and feel free to reach out to our support team with any other questions you might have.

Q: Thanks! Great webinar.
A: Thanks for joining! Feel free to reach out to our support team anytime if you have additional questions.

Forum Discussion

22 July 2020 - Dynamic Thresholds

Recent Discussions

Dashboard Sharing – An Inline Framing Method

2021-12-15 US Office Hours

Live Training - Tuning Datapoints and Alerts - 15th JUNE 2022 - APAC

Live Training - Introduction to Dashboards - 18th MAY 2022 - APAC

2022-05-11- APAC Product Overview -Collectors, Resources/Groups, Dashboards