Forum Discussion

Eric_Johnson's avatar
7 years ago

SNMP Trap Event Consolidation

It would be useful to have SNMP traps that trigger within a specific timeframe to be considered the same alert.  We have a few cases where devices start throwing traps every minute and by the time we react to fix we already have dozens of alerts.  It would be better to consider the same trap within a time frame to be the same alert to avoid this alert flood.

9 Replies

  • I completely agree with this.  Different vendors use TRAPs differently and from my experience may send the same trap multiple times, sometimes several times a minute or more even.  The TRAP functionality in LogicMonitor will not be useable in these cases because the noise it will create will be a huge distraction for any NOC to be able to handle. It takes their eyes off of other possibly critical events because of multiple duplicate alerts for the same issue.

    Let's take Barracuda for instance.  Their NextGen Firewalls have a TRAP for HA Partner Unreachable.  We received a trap every 5 minutes for about 2 hours while this situation was occurring.  From Barracuda's standpoint, this was a single event with notifications that go out every 5 minutes until the error goes away.  They don't have a pollable "GET" MIB to track this scenario either.

    I would propose the logic this way:  LM receives a trap that matches an EventSource criteria and triggers the configured alert.  That eventsource is configured with a timeout value (let's say 60 minutes).  If another Trap from the same device with the same content comes in before the timeout value, don't create a new alert, but rather increase a "count" counter on that alert AND RESET THE TIMER.  As long as no new traps come in within the configured timeout (60 minutes in this example), the alert will clear like normal.  If a new trap comes in after the timer, a new alert is generated.  You may need to provide an interface to view all the Trap data associated with that one Event Alert since there will now be multiple.

    This issue is plaguing our company right now and we are a very large MSP.  We are at the mercy of vendors who don't provide polling MIBs for some critical actions like this, hence why SNMP Traps become more of a necessity.

  • BUMPing.. We are learning of this now as we are starting to implement LM and this is horrible.
    There has to be a solution for this? Why does LM not correlate the same event and just increase the count of that same alert/trap on the Alert Console? We don't need 1000 different alerts that all pertain to the same event/trap. 

  • Late to the party (sorry, new to LM itself) but just wondering if any further consideration/progress was made on this feature request?

    I recently added a Cisco FMC trap in LM. And what would have been a single alarm in our old NMS has generated over 1100 alerts (so far).

  • I like the time out idea and would like to have that applied not just to SNMP trap event sources but to all event sources.  A similar thing can happen with Windows event logs where an event log repeats but is actually the same incident.

  • SYSLOG is another where we have this issue. We have received 1000s of duplicate alerts in the span of minutes. We had to create special escalation chains just to throttle it properly when it does it.

  • Right on Jeff and Eric.  We like Barracuda and need a solution where LM takes 1 or 100 traps of same event source criteria and understands it is one problem and not 20 separate problems.  Anyone out there been able to solve another way?  If not, can we get as Jeff suggests?

  • All, I've logged this product improvement/feature request and referenced this forum post for additional information.

  • 40 minutes ago, Mosh said:

    I like the time out idea and would like to have that applied not just to SNMP trap event sources but to all event sources.  A similar thing can happen with Windows event logs where an event log repeats but is actually the same incident.

    That's a very good point.  we do have the same problems with Windows events and likely any other event types supported.  It comes down to handling asynchronous event technologies differently that accommodates the nature of those event types to be handled well by a NOC or IT team. 

  • Reviving an old thread, but we're currently reevaluating EventSource suppression logic.

    Some of the other EventSource types already use a timeout like mechanism to avoid duplicates, but we don't do anything like that for SNMP traps.

    The general idea right now is to let the user decide which duplicate fields indicate a duplicate event, and suppress anything within the "effective interval" of the original alert. I think it makes sense to have the timer reset logic be optional. I also like the idea of providing more visibility on how many events were suppressed.

    We've also had a fair number of requests for a mechanism like the DataSource "trigger interval", where we only trigger an alert if we see the same event N times in the interval.

    Anyways, any additional feedback is appreciated.