Forum Discussion

DanB's avatar
DanB
Icon for Advisor rankAdvisor
3 years ago

Request: When high cpu alert is triggered to include top 5 cpu usage processes on the box

In our old tool (UIM/Nimsoft) the cdm probe; when it alerted when the overall CPU usage on the box was very high it would trigger an alert and in the alert message details, it would include the top 5 processes contributing to the high cpu usage. Is it possible to do this with the tool today? We are using the "NetSNMPCPUwithCores" datasource to alert when CPU usage is high on Windows based boxes. 

3 Replies

  • Anonymous's avatar
    Anonymous

    It's not easily/cleanly possible today, but it may be in the near future. This is related to a similar request from network guys to run a traceroute whenever ping fails so you can see what point along the path is broken.

  • On 3/11/2022 at 6:39 AM, Stuart Weenig said:

    It's not easily/cleanly possible today, but it may be in the near future. This is related to a similar request from network guys to run a traceroute whenever ping fails so you can see what point along the path is broken.

    We wrote a callback facility for this purpose in our custom notification script for Nagios 6+ years ago and I am still waiting for anything remotely similar to be possible in LM, so I won't hold my breath. It actually could be done within individual datapoints if they are script based (in some cases).  If the particular datapoint collection can be hijacked to also dump results into an auto property, then the datapoint message could include that property (via unconditional macro substitution), though even then I am not sure the timing would line up properly to get the result displayed in the next alert. That would also be a problem since you don't really want to trigger all that extra load unless there is an alert happening and that is not something you can know within datapoint collection.  The right solution would be to add a separate datapoint option to collect additional results for this purpose (triggered by the alert processor, not on each DP collection).

    It is funny since overall I still really appreciate LM over Nagios significantly (primarily due to AD), but there are many nuts-and-bolts features like this where our hands are tied and we are at the whim of development priorities we have no insight into or influence over.  The API helps us fill in lots of gaps, but can't help for stuff like this.