Excellent on the token thing -- looking forward to that!
What I found when I removed the operStatus AD filter was that a bunch more interfaces reported alarm almost immediately. I think my script to deactivate alerts would have eventually caught up, but it was super noisy so I quickly reverted. I need to look at the new way as my method was a necessary evil given the tools available. I did notice later, though, that failure to update the description for down interfaces made my script less useful than intended.
Nonunicast is a funny thing. Acceptable levels tend to vary in different environments (I have seen Nexus 7K cores handle 50000pps without breaking a sweat -- not good, but not deadly to the switch), but there are levels that are absolutely bad in typical environments. I normally do not set thresholds on percentage as this could trigger for ports with in otherwise inactive hosts seeing not much other than nonunicast traffic. A rule of thumb is that for access ports, under 200pps can be safely ignored (though it is still high). Trunk ports will tend to be higher as you will see combined levels for all VLANs on the trunk. When we see "freak out" levels, they are in the 1000pps or higher range. Translating to LM-speak, I would start with "> 200 1000 2000" (but again, hard to set just one good threshold).
On 2/3/2018 at 6:19 PM, Steve Francis said:
Note: as of v100, Instance level properties now work as tokens in alert messages. Development tells me they did prior to v.100 - which I thought I tested, and found the didn't - but in any case they definitely work in v.100.
Thanks! Is the format documented, or is it literally the name within the instance and it is just in scope for the datapoint instance at alert time? How are clashes with device property names avoided, I guess is my real question...
@Steve Francis I just noticed something missing in the new datasource that would be useful -- the interface speed is not available as a datapoint (not the actualspeed or actualspeedupstream, but the interface reported speed value). This is very important to have in some cases where no other method of problem detection exists. The most common I have run into is MLPPP or MLFR, where no MIB exists to inform you that even though all member ports are up, the bundle is not complete. I often do the same for ethernet aggregates for the same reason. The test is simple -- verify the bundle speed is the expected speed. So, I suppose you could add support for ExpectedSpeed ILP and automatic alert, too :). Graphing the speed over time would allow showing when upgrades happened (as is the case for disk charts). Question is, should that value be reported as-is or clamped to the ActualSpeed* values? Or perhaps both RawSpeed and Speed/SpeedUpstream?