Acknowledged date for a repeating alert condition still shows the original acknowledged date

Former Employee

10 years ago

There's a fine line between sending too many alerts and not enough.

When an alert increases in severity over the state it was acknowledged in, we treat the new level as un-acknowledged.

But in this case, the alert has been acknowledged at the critical level. If it drops to error, but doesn't clear entirely, then goes back to critical, we regard it as being in the same alert session -so the same critical acknowledgement applies. If the alert entirely clears, the future increases to critical are not treated as acknowledged.

We do this for a few use cases:

a metric that is oscillating over a threshold. (e.g. a disk volume that is 97%, then 98%, then 97%, then 98%, then 97%, etc) You probably do not want to have fresh escalations each time it bursts over 98%.
philosophically, the system is treating the acknowledgement of the alert at the critical level as someone saying "I will assume ownership of this issue, up to this severity." In this case, its the maximum severity (critical), so it is ownership of the issue until it clears.

If you want to prevent alert escalation of the criticals, but are unable to clear the issue, I'd suggest not acknowledging the alert immediately - instead put the instance in scheduled downtime for 1 hour (which will stop escalation). Work on the issue, clear it to error (or warn). If you are unable to improve beyond that, acknowledge that state. (Or adjust thresholds.) Then a future increase to critical will be escalated.

Forum Discussion

Acknowledged date for a repeating alert condition still shows the original acknowledged date

Recent Discussions

Dashboard Sharing – An Inline Framing Method

2021-12-15 US Office Hours

Live Training - Tuning Datapoints and Alerts - 15th JUNE 2022 - APAC

Live Training - Introduction to Dashboards - 18th MAY 2022 - APAC

2022-05-11- APAC Product Overview -Collectors, Resources/Groups, Dashboards