Forum Discussion

mnagel's avatar
mnagel
Icon for Professor rankProfessor
9 years ago

Alerts on Longer Periods within Datasources

For a datasource, we would like to be able to set the alert threshold over more than a single sample.  You can set the number of threshold violations needed for an alert, but this is far different in nature than setting a threshold over a time range.  For example, 60% CPU over 2 hours versus 60% CPU over 10 samples.  You might see CPU fluctuate within that period, preventing an alert, but the average over a longer period is valuable.  Similarly, we would like to get alerts not just on average over a time period, but also on slope over a time period, though perhaps the latter should be a separate request.

Thanks,

Mark

  • The number of consecutive violations multiplied by the DS polling frequency will help map out firing an alert over X minutes in age.  If you specify multiple severities in the dataPoint threshold, be advised- if the severities are "too close" to each other, then the calculation is reset if a polled value jumps into a new severity (this is true in both directions: warn->error->critical and critical->error->warn).

    For myself, I'd love to see thresholds updated to have a more structured scripting language so I can eval last X values to determine when to fire the alert.  Zabbix NMS has this (https://www.zabbix.com/documentation/2.4/manual/config/triggers/expression)