Alert Tsunami: Why the Huge Delay and Flood of Post-Resolution Power Alerts?

Question

Subject: Alert Tsunami: Why the Huge Delay and Flood of Post-Resolution Power Alerts?Hello LM Exchange community and LogicMonitor team,We recently experienced an issue that's causing significant frustration and making our alerting system less reliable. We had a couple of anticipated power cable pull-outs (testing/maintenance), which were quickly resolved.However, we then received a massive backlog of LogicMonitor alerts for this event hours after the issue was fixed and the system logs were clear.The ProblemMassive Alert Delay: The initial power loss events occurred and were resolved around 7:00 PM and 8:00 PM (based on the Lifecycle Log). However, we started getting a huge flood of critical alerts via email at 9:13 PM, 9:43 PM, 10:13 PM, and 10:43 PM—hours after the issue had been mitigated and redundancy was restored.Excessive Alert Volume: We received dozens of separate critical alerts (e.g., LME205086576, LME205086578, etc.) for a single, contained event, all arriving en masse hours later.Past "Fix" is a Concern: The last time this occurred, the only way I could stop the flood of delayed emails was to turn off alerting for the device and then turn it back on. This is not a scalable or sustainable solution for a reliable monitoring platform.Key Questions for the LogicMonitor TeamWhat is causing this significant delay in alert processing and delivery? It appears the system is holding a large backlog of alerts and then releasing them all at once hours later.What is the recommended, official way to clear an alert backlog without having to resort to manually disabling and re-enabling alerting?Is there a known configuration or polling issue that would cause a single event (like a brief power loss) to generate dozens of unique critical alerts over a short period, and how can we consolidate these into a single, actionable notification?Data for ReviewLogicMonitor Email Log (Image 1): Shows critical alerts arriving long after the issue was resolved (9:13 PM to 10:43 PM).Device Lifecycle Log (Image 2): Shows the power events (PSU0003, RDU0012) occurring and being resolved between 8:01 PM and 9:22 PM.Any insight or official guidance on how to prevent this "alert tsunami" would be greatly appreciated. We rely on timely and accurate alerting, and this behavior significantly undermines that trust.

eortiz · Answer

Could you share what type of alert it is, Ping, SNMP or WMI, along with an example alert and the thresholds set at the folder, device, or module level?

b1llw · Answer

I guess that SNMP IDRAC LOG it de snot seem to happen all the time.&nbsp;&nbsp;

orchardl · Answer

Have you verified your org's email filter isn't the culprit? If there is a tsunami of alerts coming in, an email filter might start blocking all emails from a sender for additional sandboxing. You can verify the true sent time in the email headers. Whitelisting LM emails might fix this. If you're not allowed to whitelist LogicMonitor with your email filter, you could also look cluster alerting and disabling individual alert notifications.
https://www.logicmonitor.com/support/cluster-alerts#h-managing-cluster-alerts

admine · Answer

If your using LM's default SMTP test with your own mail relay to avoid queue delays. Adding throthling can avoid queue delays aswell

Forum Discussion

Alert Tsunami: Why the Huge Delay and Flood of Post-Resolution Power Alerts?

Subject: Alert Tsunami: Why the Huge Delay and Flood of Post-Resolution Power Alerts?

The Problem

Key Questions for the LogicMonitor Team

Data for Review

4 Replies

Recent Discussions

Seeking feedback on Nutanix monitoring

Dell ECS System Level Statistics Data Sources

Dell ECS Network Statistics Version 3.6+ Flux Query

Dell ECS Flux API DataSource

How to Create a Dashboard Widget for “Sensitive” Windows Servers?