Forum Discussion

Kelemvor's avatar
Kelemvor
Icon for Expert rankExpert
2 months ago

PSA: Changes in alert severity don't work logically with ticket closing. Be careful...

This is a PSA for everyone that integrates LM with a ticketing system.

We had a server fill up it's drive and crash last week because of how LM handles alerts that change severity level, so I wanted to summarize it in case anyone needs to make changes to your system.  We use Zendesk and have things setup per the instructions on the LM Website: https://www.logicmonitor.com/support/alerts/integrations/create-update-close-tickets-zendesk-response-alerts

Active opens a ticket, Escalated updates the ticket, Cleared closes the ticket.  Pretty standard setup and 99.9% of the time, it works just fine.  However, here's the scenario where it didn't.

We had a server who's disk space was going up and down, but more up than down.  That server hit a Warning level for Free Space and created us a ticket.  If went up and down a bunch and eventually went over the Error threshold and created us an Error ticket.  After that, it dipped back into the Warning threshold which issues a Clear to the Error ticket and closed it.

The Free Space then went back to Error.  LM used the Escalate option to Update the ticket.  However, because it had previously closed the ticket, it was now updating a closed ticket and no one ever saw it.  It continued to rise and eventually got to the Critical threshold which created us a new ticket.  That one dipped back into the Error range as well which Cleared the ticket.  It then went back into Critical which led to another instance of a closed ticket getting updates that no one ever saw.  Eventually the server filled up and crashed.

If LM would have issued the Active action, instead of the Escalate action, each time it went from a lower severity to a higher one, this wouldn't have been an issue.  However, because LM considers the Alert as active until it completely clears, it decided to update the Error ticket even though it had already Cleared it out.

To work around this, we are updating our Escalated step to hard code the Status:Open setting in there so any time LM decides to update a ticket, it will force it to Open Status.  I don't think we should have to do this because LM shouldn't be updating tickets it had previously closed, but it is what it is.

So, if you have your ticket flow setup like we do, just be aware of this possibility.

  • I can't speak about Zendesk integration specifically but other ticketing systems would open a fresh ticket on an active alert and not attempt to update an existing ticket. If you are using Custom HTTP Delivery, is it setup to generate a new ticket each time?

    Imho, I don't suggest allowing LM to reopen old tickets but generate new ones. Otherwise you will have LM reopenning year-old tickets. Not related to this, but I also don't suggest auto-closing tickets. That can lead to not seeing issues that are intermittent or flapping. That is especially the case for alerts that auto-close themselves like EventSources. Unless you normally review closed tickets, I guess.

  • Actually re-reading your message, "After that, it dipped back into the Warning threshold which issues a Clear to the Error ticket and closed it". I'm pretty sure LM shouldn't be sending Clear message when moving from Error to Warning, actually it shouldn't be sending anything. LM alert list in the portal will look like it "cleared" but shouldn't send a message. Are you sure the alert didn't go below warning first?

    Also are you letting LM track which ticket to update via ##EXTERNALTICKETID##? You can't easily use LMD# or LMIDs to track tickets.

    • Kelemvor's avatar
      Kelemvor
      Icon for Expert rankExpert

      All of this is a LM problem, not a Zendesk problem.  Yes we track the ticket numbers exactly as specified in the documentation.  We use different alert rules for different severities because we notify different groups for a Critical vs an Error.  E.g. Criticals go to a 24x7 support desk so they can notify the on-call person.  Errors go directly to the team who would work the ticket.

      When an alert goes from Warn to Error to Critical, LM tells ZD to open three different tickets.  This is fine and not a problem.

      When an alert goes from Critical to Error to Warn, LM would tell ZD to close the three tickets.  This is also fine and not a problem.

      The problem is only in the strange circumstance where an alert goes up and down and back up the severity chain without ever closing.  That's where LM tells ZD to close the ticket, but then tells is later to update it instead of create a new one.

      This is a graph of the server where you can see it kept going over and under the various thresholds which caused our issues.

       

  • I've seen LM integrate with two different ticketing systems (but ZD not one of them) and they would not generate multiple tickets when just levels change, up or down. Ticket should generate one ticket, update when levels increase or clear. You shouldn't be getting multiple tickets for different levels. The same alert should use the same ticket until the ticket is closed. Your setup sounds like a ##EXTERNALTICKETID## tracking issue, or using the wrong http verbs. You are using POST for Active and PUT for update? Do you have "Include ad ID provided in HTTP response" enabled and the ##EXTERNALTICKETID## in the URL for updates and closed (not active)? Looking at https://www.logicmonitor.com/support/alerts/integrations/create-update-close-tickets-zendesk-response-alerts seems to show it all using the same ticket in the screenshot.

    But I don't know ZenDesk and I would suggest just confirming the setup is truly correct. I don't know if there are any others in the forums that also use ZD with LM. I believe LM Inc themselves use ZenDesk so might be worth reviewing your setup with support.

    P.S. Ticket handling in LM IS a pain-in-the-butt since there isn't a single ID for one alert, and you need to have LM itself do the tracking and not your ticketing system. But it shouldn't be that bad.

    • Kelemvor's avatar
      Kelemvor
      Icon for Expert rankExpert

      According to someone I chatted with a long time ago at LM, because we use different alert rules and chains for different severity levels, it's normal that it creates separate tickets for the different severity levels.  Whether I was given correct information or not, I don't know, but it has always done this with our setup and we've adjusted to it.

      We have everything setup exactly as LM has in their documentation and they've confirmed that it's "Working As Designed" even though the Design seems to be pretty bad sometimes.