Forum Discussion
QuoteIt would seem that if LM had logic like the below, we could approximate a solution to the flapping issue:
if threshold crossed and $datapoint.$alerthasbeenraised = false then raise alert and set $datapoint.$alerthasbeenraised = true
Hmm, how would you then reset $alerthasbeenraised back to false once set true? If it's once the alert has cleared, then that is how the system works now. If you never reset it, the same alert will never occur again, even if if the same problem occurs months later. If it's after the alert has cleared in the system for x number of checks, then you just re-implemented Alert Clear Interval. :)/emoticons/smile@2x.png 2x" title=":)" width="20">
You might want to look at the AI Ops stuff like Dynamic Thresholds which seems to be more of their focus to limit flapping. Or look if the ticketing system itself can handle auto-merging tickets or the like.
QuoteCan you help me reconcile these two ideas?
QuoteIf you increase Alert Clear Interval to say 10, it will wait 11 minutes before clearing the alert hence just considering it a single alert and not create a new ticket.
Sounds like you're saying that keeping an alert from clearing will keep a new ticket from being created, yet...
QuoteIt's the sending of the Active message that is creating a new ticket, not the clear message.
Exactly, if you don't clear the alert until the cause of alert is really fixed, it will not create extra tickets because there isn't any new alert instances to create tickets for. So prevent the flapping from occurring rather than deal with them afterwards. LM will not send an Active message until after the alert clears. And by "clears" I mean is no longer active in the system, not that it sends a clear message.
If you're still not sure what I mean, perhaps you can let me know how you think/expect the integration works (with example) so I get a better idea where there might be confusion.
Related Content
- 11 months ago
- 3 months ago