2 days ago

Dynamic vs Static Thresholds?


I've been using LM for years and have never really understood Dynamic Thresholds and when to and when to not use them.  How do they work in conjunction with Static thresholds or should you only use one or the other.

Here's the specific issue I'm working on right now.

I have a server that spikes the CPU all the time during the day.  Here's the current graph for this device:

What we want LM to do is "learn" that from 5AM to 5PM (or whatever it is), that the CPU is normally high during this range and to not alert us.  But in between those those times, and on weekends, it's not normal for this to happen.

We have a Dynamic threshold setup, but because the CPU spikes from 0 to 100 very quickly, LM doesn't seem to like this.  If we change the Band numbers to 3 or 4, then it just blankets everything from 1-100 in the band and we'd never get any alerts on anything.

As you can see here, The band doesn't start to "grow" until after the CPU spikes for the first time, then it goes up and continues to go up after the CPU has come back down.  Then the next time it spikes, the band grows again.  As you can see by the Purple arror, the band was coming down, but then the CPU spiked again, but the band didn't grow in time so we got an alert.

Maybe this example is not a good place to try to use Dynamic thresholds.  Would we be better off to just set a Static threshold to alert with the normal numbers, but limit it from 17:00 - 5:00 so it only alerts overnight?

