Forum Discussion

mnagel's avatar
mnagel
Icon for Professor rankProfessor
7 years ago

event alerts based on time period

We have been asked to alert if a particular Windows event occurs N times in an H hour period, a perfectly reasonable request, but we cannot do this since LM is limited to reporting only immediately when an event triggers an alert.  We can fake it somewhat by tying alerts to an escalation chain with one or more initial null stages, but this still fails to handle the requested behavior.  What is needed is a basic event correlation method to both roll up repeated events (see for example:

/topic/1590-snmp-trap-event-consolidation/

 ), and to suppress alerts unless basic conditions are met, like minimum count over a time period.  Ideally, this would work something like SEC (https://simple-evcorr.github.io/), which we have used very successfully elsewhere, but that specific tool is not required, just at least some of the features.   Others have suggested linking into a more sophisticated log handler, like ELK or Graylog, and I also think that would be awesome, but having to provide an ELK infrastructure instead of just collector instances would be a non-starter in many environments -- this functionality belongs within LM.

Thanks,
Mark

5 Replies

  • 6 minutes ago, Cole McDonald said:

    I've implemented this in the past as a dataSource for tracking number of failed connection attempts against a server over a 5 minute period.  Powershell that grabs the last 5 minutes of 4625 from the windows security log where the message contains the status for bad username or bad password.  It just returns a count rather than individual events.  This let me drive a NOC widget of devices to show brute force intrusion attempts.

    This could potentially be added like a cluster alert using the existing eventSource though and help to combine individual events into a single actionable alert to reduce noise.

    This is a super old thread, but I just came across it.... so I'll add my $.02

    Yes, for Windows events you can do this -- we do as well.  You lose the event detail, but it can alert only if N events in a window are seen (something customers ask for often).  Even then, since the "collect every" value is not visible to the script, you have to take special care to ensure your event scan window and the collect every value are in sync.  And this does nothing for any other type of event -- we have to use Sumo Logic (or other similar tools, like Graylog, etc.) to solve this problem in general.

  • Just added the same to Eric's post before seeing yours.

    Also agree that in my environment I would not want to deploy any further infrastructure; it should be a feature of LM.

  • I've implemented this in the past as a dataSource for tracking number of failed connection attempts against a server over a 5 minute period.  Powershell that grabs the last 5 minutes of 4625 from the windows security log where the message contains the status for bad username or bad password.  It just returns a count rather than individual events.  This let me drive a NOC widget of devices to show brute force intrusion attempts.

    This could potentially be added like a cluster alert using the existing eventSource though and help to combine individual events into a single actionable alert to reduce noise.

    This is a super old thread, but I just came across it.... so I'll add my $.02

  • Anonymous's avatar
    Anonymous

    This feature request appears to be gaining traction. Keep piling on. The more customers request it the easier it is to get it in sooner.

  • A vote for me, time based correlation needed where a single issue is not a major cause for concern but multiples in a defined period is, this recently lead to an escalation from a customer.