Forum Discussion

mnagel's avatar
Icon for Professor rankProfessor
4 years ago

alert rule recommendations within modules

One thing I have noticed over time is how often I find that there is a datapoint somewhere that really deserves to be included in an alert rule, but you just don't know this until after you get bitten.  This issue is orthogonal to threshold severity, at least as far as some of the modules I have seen.  An example fresh from today was loss of power in an environment with no indication it happened.  After some checking, found the Cisco FRU Power DS had an update and afterward showed when power loss (and other related issues) happened.  Whoever wrote this one decided each class of issues would be warning level only even though the DP classes themselves are warning, error and critical, grouping different conditions within each.  What I came away from with this was that LM itself should have a diagnostic capability to (among other things) recommend which datasources represent important things that ought to have alert rules but do not (or route to NoEscalation). I am not sure yet on how this out to be represented in the system, but some indication of "this one is important and should route to an alert!" in each datapoint would be a good start.  It may be there is more metadata that deserves to be included, but nothing else pops into my head right now.