Cisco EIGRP Peer alarm(s) not being supressed?

Question

Hello,

We've noticed the Cisco EIGRP PeerDown alarm(s) aren't being suppressed if the actual device goes down on LM.
	When lost SNMP connectivity to one of our routers, it started returning PeerDown alarms (since SNMP wasn't responding, causing the 'NoData' condition at the 'upTime' datapoint).
	This becomes an issue because the actual datapoint that checks the Peer status, bases itself on the data retrieved by the 'upTime' datapoint (which at this point, is as 'NoData).

Basically, if the 'upTime' doesn't return data (which happens if the actual&nbsp;device goes down) it'll trigger an alarm for&nbsp;the PeerDown instances (since it'll always return&nbsp;False).
	LogicMonitor only sees the actual device as 'down' after 5 minutes (when&nbsp;not retrieving data). This DS will alarm first (since the PeerDown will return an alarm on 2 consecutive tools - which means 3 minutes).

As per the documentation, all the alarm(s) emanating from the host will be suppressed. My question here (just to make sure) is, this will only be the case for alarms that hit 'AFTER' the host down condition correct?
	If that's true, how can we surpass this&nbsp;without having to increase the time that 'PeerDown' alarms took&nbsp;to appear in the console?

Is there any type of expression that we can use in that ComplexDatapoint (instead of the current one).
	Because, currently the fact of this device being down, caused 100 alarm(s) on the console (since it's a central point for our EIGRP routing).

Thank you!

Regards,

mike_moniz · Answer

That is my understanding too, LM has server-side logic to declare a device dead after 6 minutes (but Host Status will alert after 5min), so any alerts that occur before those 6 minutes will cause notifications.

PeerDown is using the un() function so it's specifically looking if it's NaN or not. I don't know how this particular DataSource or Cisco EIGRP works so I'm not clear if upTime can tell the difference between peer down or switch down, there might be a trick to do so. But in a more generic solution and since this is a script based DataSource, I likely would add a new DataPoint and code for something like snmpDown that reports 1 if snmp isn't working (aka device will be dead soon) and then modify the PeerDown to also check if snmp is working before alerting.

vitor_santos · Answer

After checking the OIDs I don't believe the upTime can&nbsp;tell that difference.
	I'll try to leverage that&nbsp;'general' change &amp; see if it works for us. That's a great&nbsp;idea!

Basically we could just add a new complex datapoint (via groovy) &amp; try to poll a basic OID. If it doesn't return data, then assume snmp isn't replying (snmpDown == 1).
	From there just tweak the actual PeerDown to actually&nbsp;have that value in mind before returning 0.&nbsp;

Am I in the right path? Or you had&nbsp;something more simple in mind?
	Thank you anyway for the input on this !

mike_moniz · Answer

That's the basic idea. You can't make complex datapoint via groovy so snmpDown would be a normal datapoint which you can then refer to it in PeerDown. Also I think you can just wrap the snmp.get/walk line or section in a try/catch and that will let you know the snmp request failed.

vitor_santos · Answer

Ok so I've added that try, except on the actual script.

So it pretty much returns 0 if the SNMP portion goes well &amp; returns 1 if it catches the timeout exception.
	Just added the actual SNMP walk code into the try{} &amp; added the one below as catch()

So now we're able to know if SNMP isn't working. I'm kinda lost on what to do at the 'PeerDown' datapoint (in terms of expressions). Can you help?
	Never used the complex datapoint features before.

vitor_santos · Answer

Basically I want to do what the PeerDown expression currently does:
	&nbsp;

Only if the snmpDown == 0, else, return 2 (or something != than 0)

Forum Discussion

Cisco EIGRP Peer alarm(s) not being supressed?

9 Replies

Recent Discussions

Monitor a running .exe

Enhanced Script NetScans continually overwrites Resource Groups?

Reporting back-end ignoring Time Range?

Layout of new docs site

CMDB Integration