Ad-hoc script running
Often when an alert pops up, I find myself running some very common troubleshooting/helpful tools to quickly gather more info. It would be nice to get that info quickly and easily without having to go to other tools when an alert occurs. For example - right now, when we get a high cpu alert the first thing I do is run pslist -s \\computername (PSTools are so awesome) and psloggedon \\computername to see who's logged in at themoment. I know it's possible to create a datasource to discover all active processes, and retrieve CPU/memory/disk metrics specific to a given process, but processes on a given server might change pretty frequently so you'd have to run active discovery frequently. It just doesn't seem like the best way and most of the time I don't care what's running on the server and only need to know "in the moment." A way to run a script via a button for a given datasource would be a really cool feature. Maybe on the datasource you could add a feature to hold a "gather additional data" or meta-datascript, the script could then be invoked manually onan alert or datasource instance. IE when an alert occurs, you can click on a button in the alert called "gather additional data" or something which would run the script and produce a small box or window with the output. The ability to run periodically (every 15 seconds or 5 minutes, etc) would also be useful. This would also give a NOC the ability to troubleshoot a bit more or provide some additional context around an alert without everyone having to know a bunch of tools or have administrative access to a server.4Views1like7CommentsToken to include DataSource raw output in email and alert body
We have script DataSources that output useful diagnostics information that help Operations to understand the number valuewhen an alert is generated. We want to include the raw output from a DataSource in the alert and email body. What we need is a##DSRAWOUTPUT## token which contains the complete raw output sent to standard out from a DataSource script. For example, we monitor for processes running under credentials they are no supposed to be running under, and we want to include that info as textual information in the alert/email body.3Views3likes2CommentsComplex Datapoints between Datasources
It would be great to create alerts from multiple data-points from multiple data-sources. For example if CPU is above 30% and SQL database lock timeouts is above 1000. I can see many uses cases to be able to alert on different datapoints that relate to other datapoints in other data-sources.3Views1like1CommentAlerts on Longer Periods within Datasources
For a datasource, we would like to be able to set the alert threshold over more than a single sample. You can set the number of threshold violations needed for an alert, but this is far different in nature than setting a threshold over a time range. For example, 60% CPU over 2 hours versus 60% CPU over 10 samples. You might see CPU fluctuate within that period, preventing an alert, but the average over a longer period is valuable. Similarly, we would like to get alerts not just on average over atime period, but also on slope over a time period, though perhaps the latter should be a separate request. Thanks, Mark3Views0likes1CommentCustom Ping Intervals
Currently, LM has hard-coded the Ping dataSource to use 250ms ICMP Ping intervals. We need the flexibility to adjust the Ping interval (ms)in the DataSource (Either static or system property value). Background: We've seen at least one company "Mimosa" that has changed it's newest firmware to block ICMP messages if they are sent "too quickly". For Mimosa wireless gear, this is represented in LM as an 80% packetloss (2 pings permitted, 8 are then rejected). Mimosa does not want their hardware resources depleted by multiple, quick, Ping requests. The workaround currently is to alter the thresholds in LM to compensate for an 80% packetloss reading. By having the ability to adjust Ping Interval for these hosts in LM, we can have better visibility into network issues.3Views0likes4Commentsneed method to always update instance description even if filtered
I am not sure exactly how to describe this other than by example. Wecreated an API-based method a while back to control alerting on interfaces based on the interface description. This arose because LM discovered interfaces that would come and go (e.g., laptop ports), and then would alarm about the port being down. With our change, those ports are labeled with a string that we examine to enable or disable alerting. The fly in the ointment is that if an up and monitored port went down due to some change, our clients think they should be able to change the description to influence behavior. Whichtheyshould. Unfortunately, because LM will not update the instance description due to the AD filter, the down conditionis stuck until either the description is manually changed in LM or until the instance is manually removed in LM. Manual either way, very annoying. My proposal is that there should be a way to update the instance description even if the AD filter triggers. Or a second AD filter for updates to existing instances. I am sure there are gotchas here and perhaps a better way exists. I considered using a propertysource, but I don't think that applies here. The only other option is a fake DS using the API to refresh the descriptions, but then you have to replicate the behavior of many different datasources for interfaces.2Views1like1CommentAlert Tuning for DataSource that has "Automatically Delete Instance" enabled?
I have a version of the "Oracle_DB_BlockedSessions" datasource template deployed and set an alert threshold on a complex datapoint that accounts for WAIT_TIME and SECONDS_IN_WAIT. Here is the complex datapoint expression for those curious--- if( eq(if(un(WAIT_TIME),0,WAIT_TIME), 0), if(un(SECONDS_IN_WAIT_RAW),0,SECONDS_IN_WAIT_RAW), 0) If the complex datapoint has a value over 300 seconds, an alert triggers with all the enriched instance-level autoProps from the Active Discovery script. All other aspects of this template mirror the gold-standard version--including enabling the "Automatically Delete Instance" option. Enter Client X, and they are comfortable with a threshold of 900 seconds. How can I set this custom threshold at a resource group for Client X when they don't currently have any blocking sessions? If I do manage to catch and set this Alert Tuning customization when Client X has a blocking session, will this alert tuning get wiped out when the DSIs are removed automatically? I suppose the Active Discovery script could be modified to always output a dummy instance... but that leaves an unpleasant taste in my mouth. Aside from cloning the datasource just for Client X, are there any other alternatives? And no, I do not want to alert off of the "Oracle_DB_BlockedSessionOverview" template because a it doesn't do a good job of discerning between one really long blocking session versus sequential and short-lived sessions that happen to exist at the time of the poll.2Views0likes3CommentsNative, non-REST API, Groovy helper class to add OpsNotes in a Datasource Definition.
The concept of OpsNotes is growing on me. What would really make it shine would be a Groovy helper class/client that can insert an OpsNote from within a Datasource definition for the device context without resorting to using the REST API. Use Case-- we have a datasource for SQL Servers to count the number of blocking sessions on a per database basis. The DBAs want a way to capture some of the session metadata with this metric. OpsNotes seem like a potentialway of capturing this.2Views1like1Comment