We saw the the same thing recently. It was during a period where there were lots of alerts being generated, as we had an issue on a storage platform that affected a lot of different things.
We were getting:
HTTP 500 - Delivery failed due to large volumes of failures to this URL.
HTTP 408 - Delivery failed due to timeout.
Of the ones that failed, I saw that “Delivery Retries” was either “None” or, at most, 1.
And all had “Failed to Parse” in the External Ticket ID. The Incident was actually raised in Service Now, but LM did not know what the ticket ID was and so, when the alert cleared, it tried to update but with the same “number”: “-2” so I guess the integration can’t match it.
I assume this is really a Service Now performance issue rather than a Logic Monitor issue. I’m not sure there’s much we can do to improve performance on the Service Now side as it’s a SaaS application (apologies, I’m not a Service Now person!)
Is there anything that Logic Monitor could do to improve the situation? Perhaps slow down the rate it makes calls to the Service Now API if it starts to get HTTP 500 and 408 responses? Perhaps also increase the number of retries?
I’ve had a look at the Rate Limit functionality in LM. I was hoping this would allow LM to queue up alerts and deliver them at a slower rate if an integration can’t handle the flow. I think it actually throws away alerts when the rate is too high.
Dave