Forum Discussion
QuoteAs I read this, it says, "LM doesn't consider how long an alert has been cleared before it sends an Active message on re-occurring alert." and "you can modify this behavior by changing the Alert Clear Interval" I swear I'm not being argumentative! Can you help me square these two seeming conflicting ideas?
No problems. :)/emoticons/smile@2x.png 2x" title=":)" width="20">
So instead of trying to merge multiple alerts into one, you can just make just one long that doesn't clear until it's really fixed. DataSource alert basically would work like, this as an example:
You have a CPU check with a threshold of > 90 that runs every minute. Has "Alert Trigger Interval" configured for 3 and "Alert Clear Interval" for 2.
- 1:00pm: CPU is at 40%. No alerts
- 1:01pm: CPU is at 100%. LM notes it's over threshold but is waiting for Alert Trigger Interval to hit 3 (at 0)
- 1:02pm: CPU is at 100%. LM notes it's over threshold but is waiting for Alert Trigger Interval to hit 3 (at 1)
- 1:03pm: CPU is at 100%. LM notes it's over threshold but is waiting for Alert Trigger Interval to hit 3 (at 2)
- 1:04pm: CPU is at 100%. Warning alert created. Active message sent to integration creating new ticket. (at 3!)
- 1:05pm: CPU is at 100%. Alert ACK. ACK message sent to integration.
- 1:10pm: CPU is at 40%. LM notes it's under threshold but waits for Alert Clear Interval to hit 2 (at 0)
- 1:11pm: CPU is at 40%. LM notes it's under threshold but waits for Alert Clear Interval to hit 2 (at1)
- 1:12pm: CPU is at 40%. Alert Cleared. Clear message sent to integration (at 2!)
- 1:15pm: CPU is at 100%. LM notes it's over threshold but is waiting for Alert Trigger Interval to hit 3 (at 0)
- 1:16pm: CPU is at 100%. LM notes it's over threshold but is waiting for Alert Trigger Interval to hit 3 (at 1)
- 1:17pm: CPU is at 100%. LM notes it's over threshold but is waiting for Alert Trigger Interval to hit 3 (at 2)
- 1:18pm: CPU is at 100%. Warning alert created. Active message sent to integration creating new ticket. (at 3!)
So what is happening here is that the CPU was only ok for 5 minutes before it flapped the alert. And LM was told to only wait for 3 minutes before it should clear the alert, allowing a new ticket. If you increase Alert Clear Interval to say 10, it will wait 11 minutes before clearing the alert hence just considering it a single alert and not create a new ticket. So you can customize the "flapping timeout" (for lack of a better name) by changing the Alert Clear Interval. One possible problem is that this is a per-DataPoint option and not system wide.
Now perhaps you prefer LM to just keep using the same ticket, regardless of delay between flapping, until the ticket has been resolved. But LogicMonitor does not know the state of the ticket, it doesn't know if a ticket is still opened or not. To do this you would need to provide this within the ticket system itself or using some special system between the two.
QuoteFWIW, in an attempt to keep new, redundant (to me) tickets from being created, I've configured our CWM integration so that LM alerts that to to Cleared status sets/keeps the CWM ticket status to Ack, not Cleared, yet new tickets are being created while leaving the ticket generated by the original LM alert in Ack status.
It's the sending of the Active message that is creating a new ticket, not the clear message. LM doesn't track ticket so it's not going to only send Active alerts if it has also sent a clear alert. They are all independent and it's more straightforward:
- An alert has occurred = send Active message.
- An alert has escalated = send ACK message.
- An alert has been ACK = send ACK message.
- An alert has cleared = send Clear message.
Hope that helps clears things up a bit :)/emoticons/smile@2x.png 2x" title=":)" width="20">
Related Content
- 2 years ago
- 5 months ago