Forum Discussion

ntw2's avatar
6 years ago

Can LM update ConnectWise Manage tickets?

For more than a year, I've gone back and forth with LM support about LM's seeming inability to update CWM tickets instead of creating a new ticket every time a datapoint's status flaps.

LM support swears that this behavior is by design, but can't identify a use case for it. Moreover, this documentation says that it should be able to update tickets: https://www.logicmonitor.com/support/alerts/integrations/connectwise-integration/?_ga=2.193114093.1167024280.1578609865-61119326.1554263269

Does anyone else's LM update CWM tickets, or is their documentation wrong?

  • What do you mean by status flaps? Do you mean a datapoint that alerts, then clears, then alerts, then clears, etc or the literal "StatusFlap" datapoint (de)escalating that some interface checks have? I can't speak for ConnectWise Integration itself since I don't use it, but most of the integrations (without client-side addons) work the same way. I would expect the former to cause a new ticket each time but the later to update.

    An alert that occurs would trigger the "Active" action in the integration which usually creates a new ticket. LM doesn't consider how long an alert has been cleared before it sends an Active message on re-occurring alert. It doesn't matter if the alert cleared 1 minute ago or 3 years ago before the alert occurred again. I believe the point is to change the Alert Clear Interval to make sure the condition has been really cleared and stable before clearing the alert.

     

  • Hi, Mike

    Thank you for writing. You asked:

    Quote

    What do you mean by status flaps? Do you mean a datapoint that alerts, then clears, then alerts, then clears, etc


    Yes, that's the behavior I'm referring to.

    Help me reconcile these two ideas:

    Quote

    LM doesn't consider how long an alert has been cleared before it sends an Active message on re-occurring alert. It doesn't matter if the alert cleared 1 minute ago or 3 years ago before the alert occurred again. I believe the point is to change the Alert Clear Interval to make sure the condition has been really cleared and stable before clearing the alert.


    As I read this, it says, "LM doesn't consider how long an alert has been cleared before it sends an Active message on re-occurring alert." and "you can modify this behavior by changing the Alert Clear Interval"

    I swear I'm not being argumentative! Can you help me square these two seeming conflicting ideas?

    FWIW, in an attempt to keep new, redundant (to me) tickets from being created, I've configured our CWM integration so that LM alerts that to to Cleared status sets/keeps the CWM ticket status to Ack, not Cleared, yet new tickets are being created while leaving the ticket generated by the original LM alert in Ack status.

  • Quote

    As I read this, it says, "LM doesn't consider how long an alert has been cleared before it sends an Active message on re-occurring alert." and "you can modify this behavior by changing the Alert Clear Interval" I swear I'm not being argumentative! Can you help me square these two seeming conflicting ideas?

    No problems. :)/emoticons/smile@2x.png 2x" title=":)" width="20">

    So instead of trying to merge multiple alerts into one, you can just make just one long that doesn't clear until it's really fixed. DataSource alert basically would work like, this as an example:

    You have a CPU check with a threshold of > 90 that runs every minute. Has "Alert Trigger Interval" configured for 3 and "Alert Clear Interval" for 2.

    • 1:00pm: CPU is at 40%. No alerts
    • 1:01pm: CPU is at 100%. LM notes it's over threshold but is waiting for Alert Trigger Interval to hit 3 (at 0)
    • 1:02pm: CPU is at 100%. LM notes it's over threshold but is waiting for Alert Trigger Interval to hit 3 (at 1)
    • 1:03pm: CPU is at 100%. LM notes it's over threshold but is waiting for Alert Trigger Interval to hit 3 (at 2)
    • 1:04pm: CPU is at 100%. Warning alert created. Active message sent to integration creating new ticket. (at 3!)
    • 1:05pm: CPU is at 100%. Alert ACK. ACK message sent to integration.
    • 1:10pm: CPU is at 40%. LM notes it's under threshold but waits for Alert Clear Interval to hit 2 (at 0)
    • 1:11pm: CPU is at 40%. LM notes it's under threshold but waits for Alert Clear Interval to hit 2 (at1)
    • 1:12pm: CPU is at 40%. Alert Cleared. Clear message sent to integration (at 2!)
    • 1:15pm: CPU is at 100%. LM notes it's over threshold but is waiting for Alert Trigger Interval to hit 3 (at 0)
    • 1:16pm: CPU is at 100%. LM notes it's over threshold but is waiting for Alert Trigger Interval to hit 3 (at 1)
    • 1:17pm: CPU is at 100%. LM notes it's over threshold but is waiting for Alert Trigger Interval to hit 3 (at 2)
    • 1:18pm: CPU is at 100%. Warning alert created. Active message sent to integration creating new ticket. (at 3!)

    So what is happening here is that the CPU was only ok for 5 minutes before it flapped the alert. And LM was told to only wait for 3 minutes before it should clear the alert, allowing a new ticket. If you increase Alert Clear Interval to say 10, it will wait 11 minutes before clearing the alert hence just considering it a single alert and not create a new ticket. So you can customize the "flapping timeout" (for lack of a better name) by changing the Alert Clear Interval. One possible problem is that this is a per-DataPoint option and not system wide.

    Now perhaps you prefer LM to just keep using the same ticket, regardless of delay between flapping, until the ticket has been resolved. But LogicMonitor does not know the state of the ticket, it doesn't know if a ticket is still opened or not. To do this you would need to provide this within the ticket system itself or using some special system between the two.

    Quote

    FWIW, in an attempt to keep new, redundant (to me) tickets from being created, I've configured our CWM integration so that LM alerts that to to Cleared status sets/keeps the CWM ticket status to Ack, not Cleared, yet new tickets are being created while leaving the ticket generated by the original LM alert in Ack status.

    It's the sending of the Active message that is creating a new ticket, not the clear message. LM doesn't track ticket so it's not going to only send Active alerts if it has also sent a clear alert. They are all independent and it's more straightforward:

    • An alert has occurred = send Active message.
    • An alert has escalated = send ACK message.
    • An alert has been ACK = send ACK message.
    • An alert has cleared = send Clear message.

    Hope that helps clears things up a bit :)/emoticons/smile@2x.png 2x" title=":)" width="20">

  • Mike, thank you for taking the time to patiently illustrate what's going on. It certainly helped me and I think it will help others.

    Quote

    So you can customize the "flapping timeout" (for lack of a better name) by changing the Alert Clear Interval. One possible problem is that this is a per-DataPoint option and not system wide.

    Man, I wish that was a system-wide option!

    Quote

    Now perhaps you prefer LM to just keep using the same ticket, regardless of delay between flapping...

    Yes! Yes, I do! 

    It would seem that if LM had logic like the below, we could approximate a solution to the flapping issue:

    if threshold crossed and $datapoint.$alerthasbeenraised = false then raise alert and set $datapoint.$alerthasbeenraised = true 

    ____________________

    Can you help me reconcile these two ideas? 

    Quote

     If you increase Alert Clear Interval to say 10, it will wait 11 minutes before clearing the alert hence just considering it a single alert and not create a new ticket.

    Sounds like you're saying that keeping an alert from clearing will keep a new ticket from being created, yet...

    Quote

    It's the sending of the Active message that is creating a new ticket, not the clear message.

    Thanks again!

  • Quote

    It would seem that if LM had logic like the below, we could approximate a solution to the flapping issue:

    if threshold crossed and $datapoint.$alerthasbeenraised = false then raise alert and set $datapoint.$alerthasbeenraised = true

    Hmm, how would you then reset $alerthasbeenraised back to false once set true? If it's once the alert has cleared, then that is how the system works now. If you never reset it, the same alert will never occur again, even if if the same problem occurs months later. If it's after the alert has cleared in the system for x number of checks, then you just re-implemented Alert Clear Interval. :)/emoticons/smile@2x.png 2x" title=":)" width="20">

    You might want to look at the AI Ops stuff like Dynamic Thresholds which seems to be more of their focus to limit flapping. Or look if the ticketing system itself can handle auto-merging tickets or the like.

    Quote

    Can you help me reconcile these two ideas?

    Quote

     If you increase Alert Clear Interval to say 10, it will wait 11 minutes before clearing the alert hence just considering it a single alert and not create a new ticket.

    Sounds like you're saying that keeping an alert from clearing will keep a new ticket from being created, yet...

    Quote

    It's the sending of the Active message that is creating a new ticket, not the clear message.


    Exactly, if you don't clear the alert until the cause of alert is really fixed, it will not create extra tickets because there isn't any new alert instances to create tickets for. So prevent the flapping from occurring rather than deal with them afterwards. LM will not send an Active message until after the alert clears. And by "clears" I mean is no longer active in the system, not that it sends a clear message.

    If you're still not sure what I mean, perhaps you can let me know how you think/expect the integration works (with example) so I get a better idea where there might be confusion.

  • Quote

    Hmm, how would you then reset $alerthasbeenraised back to false once set true? If it's once the alert has cleared, then that is how the system works now. If you never reset it, the same alert will never occur again, even if if the same problem occurs months later. If it's after the alert has cleared in the system for x number of checks, then you just re-implemented Alert Clear Interval. :)/emoticons/smile@2x.png 2x" style="border:0px;vertical-align:middle;" title=":)" width="20">

    I'm glad you asked! It would be set back to false when we mark it as resolved either in Connectwise Manage or in LM, a feature that doesn't currently exist. This way, the alerts generated by flapping would be ingested into the same ticket as updates, but wouldn't change the status of the ticket.

    Quote

    If you're still not sure what I mean, perhaps you can let me know how you think/expect the integration works (with example) so I get a better idea where there might be confusion.

     

    This is the workflow that I'd like to see with LM and Connectwise Manage which N-Central has accomplished with CWM:

    Alert threshold for Datapoint A is crossed, alert raised, ticket 001 created, status New.

    Alert status clears, ticket 001 is updated, status remains unchanged.

    Tech investigates the issue, communicates with the client, makes valuable internal notes, and resolves it. Marks ticket as Solved in CWM. Alert in LM clears because the datapoint cleared, not because the ticket was marked as Solved in CWM.

    Days/weeks/months/eons pass.

    Alert threshold for Datapoint A is crossed, alert raised, ticket 001 is reopened thereby retaining the earlier communication with the client, the internal notes, etc. Status changes from Solved to New, Ack, Re-opened - not terribly important.

    The points are:

    1. There should be a one-to-one relationship between an alert raised by a datapoint its ticket; flapping of a datapoint's status should update its ticket, not under any circumstances create a new ticket.
    2. A CWM ticket raised by LM should remain open until Solved in CWM.
    3. LM should be the truth; the status of an LM alert should not be impacted by the status of its ticket in CWM.

    I hope this helps,
    Nate

  • Quote

    I'm glad you asked! It would be set back to false when we mark it as resolved either in Connectwise Manage or in LM, a feature that doesn't currently exist. This way, the alerts generated by flapping would be ingested into the same ticket as updates, but wouldn't change the status of the ticket... There should be a one-to-one relationship between an alert raised by a datapoint its ticket; flapping of a datapoint's status should update its ticket, not under any circumstances create a new ticket. A CWM ticket raised by LM should remain open until Solved in CWM.

    We have our setup working kinda like that. The same alert will update the same ticket and will only cause a new ticket once the original ticket has been closed. But we use ServiceNow and all of that process is completed within the ticketing systems itself (I'm not too involved with how it works) and independent of LM or any other system. We do still try to limit flapping situations though.

    Quote

    Days/weeks/months/eons pass.

    Alert threshold for Datapoint A is crossed, alert raised, ticket 001 is reopened thereby retaining the earlier communication with the client, the internal notes, etc. Status changes from Solved to New, Ack, Re-opened - not terribly important.

    That is something that might be different in various ticketing system. For example, we never re-open an old ticket, it actually prevents us from doing that if been closed after a period of time. We can have tickets reference another older ticket but it's still a new ticket. I think that would throw off our reports and such to have old tickets come back that way.

    But I think these are all valid options but something I personally feel is better suited for the ticketing system to handle (or middle-ware) since LM isn't all flexible with integrations. I just let LM do the monitoring and use other systems to deal with ticket routing, notifications, SLAs and such. Then again we're able to use a ($$$) ITIL/ticketing system :)

    So, I don't think I have any further suggestions for this case. Unless LM implements some sort of Business Rules like feature where you can have conditional effects to integration messages. Does CWM have any workflow or ticket pre-processing options? If it does, this might be easy since you can directly tie the LMX###### directly to one ticket forever.