Forum Discussion

Kelemvor's avatar
Kelemvor
Icon for Expert rankExpert
2 months ago

Do Static and Dynamic thresholds work together?

Hi,

I don't understand Dynamic thresholds and the weird UI to set them up.  The whole thing about band factors and things make no sense to me.

Here's my specific scenario and maybe someone can tell me the best way to handle this.

We have a server that spikes it's CPU up pretty high every weekday morning.  Generally starts around 5AM and ends around 9-10AM.  Some days (Fridays) it seems to go longer.  We don't want to get any alerts when it does this.

Here's the graph for December (even though it looks like the spikes only go to 60, they actually all go to 99 or 100 when zoomed in more):

We tried setting a daily, recurring SDT, but that still shows the errors, it just doesn't notify us about them.  We want LM to consider the morning Spikes as "normal" and to ignore them.  We setup a Dynamic Threshold to see if that would help.  Here's a screenshot of what that looks like:

As you can see by the arrow, this doesn't seem to have "Learned" what normal is.  It seems like it just waits for the CPU to spike and then adjusts the "Expected Range" to compensate.  If it was actually Learning, it should have expected the spike, since it happens every weekday, and adjusted BEFORE it happened.  Right?

Also, we have the standard Static Thresholds also enabled so we alert at 90/95/98 for this server.  We get alerts for it all the time and aren't sure how to properly set this up.

If we use the Dynamic alerts, should we turn off the Static ones since one doesn't seem to override the other?  Should the Dynamic expected range know that the morning spike is going to happen or is that not how dynamic thresholds work?  We rarely use them because we just don't get how to use them properly even after reading all the KBs and such.

Any ideas, opinions, etc would be great.

  • Rather then use a scheduled SDT, you can setup static threshold that change depending on the time of day. For example you can override the 24-hour standard threshold so that between ~5-10am the thresholds are higher. For example (untested):

    I haven't used dynamic thresholds much myself either.

    • Kelemvor's avatar
      Kelemvor
      Icon for Expert rankExpert

      Yeah.  That would basically turn off alerting instead of just suppressing the notifications so that'd probably be a better idea.  It always starts at the same time.  Too bad it doesn't always end at the same time. ;)

  • We have used both together. For something like CPU Utilization, I would have the dynamic threshold ignore the bottom band. Generally you don't care if something is using "less" in this case (sometimes you do care tho of course).

    We then have our static threshold set, but have it a little higher then we normally would. That way if the utilization slowly creeps up, it will still trigger an alarm after a bit.

    • Kelemvor's avatar
      Kelemvor
      Icon for Expert rankExpert

      Yes, we don't care about anything below the band, just above.  The issue I have is that every weekday, at 5AM, the CPU spikes up to 100%.  We know it's going to happen.  Since it happens every weekday, if LM really "Learned" what was normal for a server, it should put that gray area at 100% right at 5AM.  According to the screenshot above, that's not happening.  It's waiting for the CPU to spike up and is then moving the bar up to compensate.  That doesn't help us because this isn't a gradual increase.  It's 0% one minute and 99% the next.

      We thought the whole point of Dynamic is they are supposed to look at the server and learn it's behavior over time.  This doesn't seem to actually work or something is wrong.