Do Static and Dynamic thresholds work together?
Hi, I don't understand Dynamic thresholds and the weird UI to set them up. The whole thing about band factors and things make no sense to me. Here's my specific scenario and maybe someone can tell me the best way to handle this. We have a server that spikes it's CPU up pretty high every weekday morning. Generally starts around 5AM and ends around 9-10AM. Some days (Fridays) it seems to go longer. We don't want to get any alerts when it does this. Here's the graph for December (even though it looks like the spikes only go to 60, they actually all go to 99 or 100 when zoomed in more): We tried setting a daily, recurring SDT, but that still shows the errors, it just doesn't notify us about them. We want LM to consider the morning Spikes as "normal" and to ignore them. We setup a Dynamic Threshold to see if that would help. Here's a screenshot of what that looks like: As you can see by the arrow, this doesn't seem to have "Learned" what normal is. It seems like it just waits for the CPU to spike and then adjusts the "Expected Range" to compensate. If it was actually Learning, it should have expected the spike, since it happens every weekday, and adjusted BEFORE it happened. Right? Also, we have the standard Static Thresholds also enabled so we alert at 90/95/98 for this server. We get alerts for it all the time and aren't sure how to properly set this up. If we use the Dynamic alerts, should we turn off the Static ones since one doesn't seem to override the other? Should the Dynamic expected range know that the morning spike is going to happen or is that not how dynamic thresholds work? We rarely use them because we just don't get how to use them properly even after reading all the KBs and such. Any ideas, opinions, etc would be great.76Views3likes4CommentsBest Practices For Practitioners: Dynamic Thresholds
The modern IT infrastructure consists of a complex ecosystem of interconnected systems spanning cloud and on-premises environments, generating unprecedented volumes of monitoring data. Static thresholds may not be intuitive enough to capture the nuanced performance characteristics of these dynamic environments due to the amount of incoming data, leading to alert fatigue, missed critical events, and inefficient resource allocation. Dynamic thresholds represent an evolutionary step in monitoring technology, leveraging advanced algorithms to create intelligent, adaptive monitoring strategies that distinguish between normal performance variations and genuine anomalies. Key Principles Dynamic thresholds transform monitoring by introducing adaptive mechanisms that intelligently interpret performance data. By analyzing historical performance data, these adaptive mechanisms move beyond rigid, predefined alert triggers, instead creating context-aware monitoring that understands the unique behavioral patterns of each monitored resource. This approach simultaneously addresses two critical challenges in modern IT operations: reducing unnecessary alert noise while ensuring that significant performance deviations are immediately identified and communicated. Recommended Implementation Strategies When to Use Dynamic Thresholds Recommended for: Metrics with varying performance patterns across instances Complex environments with diverse resource utilization Metrics where static thresholds are difficult to establish Not Recommended for: Status datapoints (up/down) Discrete value metrics (e.g., HTTP error codes) Metrics with consistently defined good/bad ranges Configuration Levels Global Level Best when most instances have different performance patterns Ideal for metrics like: CPU utilization Number of connections/requests Network latency Resource Group Level Useful for applying consistent dynamic thresholds across similar resources Cascades settings to all group instances Instance Level Perfect for experimenting or handling outlier instances Recommended when you want to: Reduce noise for specific instances Test dynamic thresholds on a limited subset of infrastructure Technical Considerations Minimum Training Data 5 hours required for initial configuration Up to 15 days of historical data used for refinement Detects daily and weekly trends Alert Configuration Configure to both trigger and suppress alerts Adjust advanced settings like: Percentage of anomalous values Band factor sensitivity Deviation direction (upper/lower/both) Pro Tip: Combining Static and Dynamic Thresholds Static and dynamic thresholds are not mutually exclusive—they can be powerful allies in your monitoring strategy. By implementing both: Use dynamic thresholds to reduce noise and catch subtle performance variations Maintain static thresholds for critical, well-defined alert conditions Create a multi-layered alerting approach that provides both granular insights and critical fail-safes Example: Dynamic thresholds for warning/error levels to adapt to performance variations Static thresholds for critical alerts to ensure immediate notification of severe issues Recommended Configuration Strategy Enable dynamic thresholds for warning/error severity levels Maintain static thresholds for critical alerts Use the "Value" comparison method when possible Best Practices Checklist ✅ Analyze existing alert trends before implementation ✅ Start with a small, representative subset of infrastructure ✅ Monitor and adjust threshold sensitivity ✅ Combine with static thresholds for comprehensive coverage ✅ Regularly review and refine dynamic threshold configurations Monitoring and Validation Utilize Alert Thresholds Report to track configuration Use Anomaly filter to review dynamic threshold-triggered alerts Compare alert volumes before and after implementation Conclusion Dynamic thresholds represent a paradigm shift in performance monitoring, bridging the gap between traditional alerting mechanisms and the complex, fluid nature of modern IT infrastructures. By leveraging machine learning and statistical analysis, these advanced monitoring techniques provide IT operations teams with a more nuanced, intelligent, and efficient approach to detecting and responding to performance anomalies. As IT environments continue to grow in complexity and scale, dynamic thresholds will become an essential tool for maintaining system reliability, optimizing resource utilization, and enabling proactive operational management. The true power of dynamic thresholds lies not just in their technological sophistication but in their ability to transform how organizations approach system monitoring—shifting from a culture of constant reaction to one of strategic, data-driven performance management. Additional Resources Enabling Dynamic Thresholds734Views6likes0CommentsDynamic vs Static Thresholds?
Hi, I've been using LM for years and have never really understood Dynamic Thresholds and when to and when to not use them. How do they work in conjunction with Static thresholds or should you only use one or the other. Here's the specific issue I'm working on right now. I have a server that spikes the CPU all the time during the day. Here's the current graph for this device: What we want LM to do is "learn" that from 5AM to 5PM (or whatever it is), that the CPU is normally high during this range and to not alert us. But in between those those times, and on weekends, it's not normal for this to happen. We have a Dynamic threshold setup, but because the CPU spikes from 0 to 100 very quickly, LM doesn't seem to like this. If we change the Band numbers to 3 or 4, then it just blankets everything from 1-100 in the band and we'd never get any alerts on anything. As you can see here, The band doesn't start to "grow" until after the CPU spikes for the first time, then it goes up and continues to go up after the CPU has come back down. Then the next time it spikes, the band grows again. As you can see by the Purple arror, the band was coming down, but then the CPU spiked again, but the band didn't grow in time so we got an alert. Maybe this example is not a good place to try to use Dynamic thresholds. Would we be better off to just set a Static threshold to alert with the normal numbers, but limit it from 17:00 - 5:00 so it only alerts overnight?30Views2likes0Comments6 polls, 5 poll cycles, or 12 minutes
Alright, take a look at this. The poll rate for this DS is 2 minutes. What is the trigger window? Assume all the criteria is met starting a 1:59 PM and 30 seconds (between polls). When will the alert be triggered? It’s actually 10 minutes (2:10 PM, +/- a couple seconds for collector task queue delay and assuming original scheduling at the top of the hour). Do you know why? Why does it say 12 minutes? Why not 10? Is this behavior different than regular alert trigger intervals?Solved74Views2likes3Comments