For more than a year, I've gone back and forth with LM support about LM's seeming inability to update CWM tickets instead of creating a new ticket every time a datapoint's status flaps.
LM support swears that this behavior is by design, but can't identify a use case for it. Moreover, this documentation says that it should be able to update tickets: https://www.logicmonitor.com/support/alerts/integrations/connectwise-integration/?_ga=2.193114093.1167024280.1578609865-61119326.1554263269
Does anyone else's LM update CWM tickets, or is their documentation wrong?
What do you mean by status flaps? Do you mean a datapoint that alerts, then clears, then alerts, then clears, etc or the literal "StatusFlap" datapoint (de)escalating that some interface checks have? I can't speak for ConnectWise Integration itself since I don't use it, but most of the integrations (without client-side addons) work the same way. I would expect the former to cause a new ticket each time but the later to update.
An alert that occurs would trigger the "Active" action in the integration which usually creates a new ticket. LM doesn't consider how long an alert has been cleared before it sends an Active message on re-occurring alert. It doesn't matter if the alert cleared 1 minute ago or 3 years ago before the alert occurred again. I believe the point is to change the Alert Clear Interval to make sure the condition has been really cleared and stable before clearing the alert.
Thank you for writing. You asked:
Yes, that's the behavior I'm referring to.
Help me reconcile these two ideas:
As I read this, it says, "LM doesn't consider how long an alert has been cleared before it sends an Active message on re-occurring alert." and "you can modify this behavior by changing the Alert Clear Interval"
I swear I'm not being argumentative! Can you help me square these two seeming conflicting ideas?
FWIW, in an attempt to keep new, redundant (to me) tickets from being created, I've configured our CWM integration so that LM alerts that to to Cleared status sets/keeps the CWM ticket status to Ack, not Cleared, yet new tickets are being created while leaving the ticket generated by the original LM alert in Ack status.
So instead of trying to merge multiple alerts into one, you can just make just one long that doesn't clear until it's really fixed. DataSource alert basically would work like, this as an example:
You have a CPU check with a threshold of > 90 that runs every minute. Has "Alert Trigger Interval" configured for 3 and "Alert Clear Interval" for 2.
So what is happening here is that the CPU was only ok for 5 minutes before it flapped the alert. And LM was told to only wait for 3 minutes before it should clear the alert, allowing a new ticket. If you increase Alert Clear Interval to say 10, it will wait 11 minutes before clearing the alert hence just considering it a single alert and not create a new ticket. So you can customize the "flapping timeout" (for lack of a better name) by changing the Alert Clear Interval. One possible problem is that this is a per-DataPoint option and not system wide.
Now perhaps you prefer LM to just keep using the same ticket, regardless of delay between flapping, until the ticket has been resolved. But LogicMonitor does not know the state of the ticket, it doesn't know if a ticket is still opened or not. To do this you would need to provide this within the ticket system itself or using some special system between the two.
It's the sending of the Active message that is creating a new ticket, not the clear message. LM doesn't track ticket so it's not going to only send Active alerts if it has also sent a clear alert. They are all independent and it's more straightforward:
Hope that helps clears things up a bit
Mike, thank you for taking the time to patiently illustrate what's going on. It certainly helped me and I think it will help others.
Man, I wish that was a system-wide option!
Yes! Yes, I do!
It would seem that if LM had logic like the below, we could approximate a solution to the flapping issue:
if threshold crossed and $datapoint.$alerthasbeenraised = false then raise alert and set $datapoint.$alerthasbeenraised = true
Can you help me reconcile these two ideas?
Sounds like you're saying that keeping an alert from clearing will keep a new ticket from being created, yet...
Hmm, how would you then reset $alerthasbeenraised back to false once set true? If it's once the alert has cleared, then that is how the system works now. If you never reset it, the same alert will never occur again, even if if the same problem occurs months later. If it's after the alert has cleared in the system for x number of checks, then you just re-implemented Alert Clear Interval.
You might want to look at the AI Ops stuff like Dynamic Thresholds which seems to be more of their focus to limit flapping. Or look if the ticketing system itself can handle auto-merging tickets or the like.
Exactly, if you don't clear the alert until the cause of alert is really fixed, it will not create extra tickets because there isn't any new alert instances to create tickets for. So prevent the flapping from occurring rather than deal with them afterwards. LM will not send an Active message until after the alert clears. And by "clears" I mean is no longer active in the system, not that it sends a clear message.
If you're still not sure what I mean, perhaps you can let me know how you think/expect the integration works (with example) so I get a better idea where there might be confusion.
I'm glad you asked! It would be set back to false when we mark it as resolved either in Connectwise Manage or in LM, a feature that doesn't currently exist. This way, the alerts generated by flapping would be ingested into the same ticket as updates, but wouldn't change the status of the ticket.
This is the workflow that I'd like to see with LM and Connectwise Manage which N-Central has accomplished with CWM:
Alert threshold for Datapoint A is crossed, alert raised, ticket 001 created, status New.
Alert status clears, ticket 001 is updated, status remains unchanged.
Tech investigates the issue, communicates with the client, makes valuable internal notes, and resolves it. Marks ticket as Solved in CWM. Alert in LM clears because the datapoint cleared, not because the ticket was marked as Solved in CWM.
Alert threshold for Datapoint A is crossed, alert raised, ticket 001 is reopened thereby retaining the earlier communication with the client, the internal notes, etc. Status changes from Solved to New, Ack, Re-opened - not terribly important.
The points are:
I hope this helps,
We have our setup working kinda like that. The same alert will update the same ticket and will only cause a new ticket once the original ticket has been closed. But we use ServiceNow and all of that process is completed within the ticketing systems itself (I'm not too involved with how it works) and independent of LM or any other system. We do still try to limit flapping situations though.
That is something that might be different in various ticketing system. For example, we never re-open an old ticket, it actually prevents us from doing that if been closed after a period of time. We can have tickets reference another older ticket but it's still a new ticket. I think that would throw off our reports and such to have old tickets come back that way.
But I think these are all valid options but something I personally feel is better suited for the ticketing system to handle (or middle-ware) since LM isn't all flexible with integrations. I just let LM do the monitoring and use other systems to deal with ticket routing, notifications, SLAs and such. Then again we're able to use a ($$$) ITIL/ticketing system :)
So, I don't think I have any further suggestions for this case. Unless LM implements some sort of Business Rules like feature where you can have conditional effects to integration messages. Does CWM have any workflow or ticket pre-processing options? If it does, this might be easy since you can directly tie the LMX###### directly to one ticket forever.