StatusPage.IO Monitoring
I have built a generic StatusPage.IO datasource to allow for monitoring the status of various services we use. Since so many companies are using StatusPage.io, I figured it's a good idea to have a heads up in the event there is an outage with one of our many service providers. This has worked well as an early warning system for our service desk guys to know about issues before they start getting calls from end users. LogicMonitor actually uses StatusPage, but of course there are many, many others. Attached is a screenshot of the Box.com StatusPage data that we've collected from https://status.box.com. This datasource should be universal to any statuspage.io site. So far it has worked against every site I have tested it against. NYJG6J41Views2likes0CommentsCustom Ping Intervals
Currently, LM has hard-coded the Ping dataSource to use 250ms ICMP Ping intervals. We need the flexibility to adjust the Ping interval (ms) in the DataSource (Either static or system property value). Background: We've seen at least one company "Mimosa" that has changed it's newest firmware to block ICMP messages if they are sent "too quickly". For Mimosa wireless gear, this is represented in LM as an 80% packetloss (2 pings permitted, 8 are then rejected). Mimosa does not want their hardware resources depleted by multiple, quick, Ping requests. The workaround currently is to alter the thresholds in LM to compensate for an 80% packetloss reading. By having the ability to adjust Ping Interval for these hosts in LM, we can have better visibility into network issues.29Views0likes4CommentsToken to include DataSource raw output in email and alert body
We have script DataSources that output useful diagnostics information that help Operations to understand the number value when an alert is generated. We want to include the raw output from a DataSource in the alert and email body. What we need is a ##DSRAWOUTPUT## token which contains the complete raw output sent to standard out from a DataSource script. For example, we monitor for processes running under credentials they are no supposed to be running under, and we want to include that info as textual information in the alert/email body.23Views3likes2CommentsAd-hoc script running
Often when an alert pops up, I find myself running some very common troubleshooting/helpful tools to quickly gather more info. It would be nice to get that info quickly and easily without having to go to other tools when an alert occurs. For example - right now, when we get a high cpu alert the first thing I do is run pslist -s \\computername (PSTools are so awesome) and psloggedon \\computername to see who's logged in at the moment. I know it's possible to create a datasource to discover all active processes, and retrieve CPU/memory/disk metrics specific to a given process, but processes on a given server might change pretty frequently so you'd have to run active discovery frequently. It just doesn't seem like the best way and most of the time I don't care what's running on the server and only need to know "in the moment." A way to run a script via a button for a given datasource would be a really cool feature. Maybe on the datasource you could add a feature to hold a "gather additional data" or meta-data script, the script could then be invoked manually on an alert or datasource instance. IE when an alert occurs, you can click on a button in the alert called "gather additional data" or something which would run the script and produce a small box or window with the output. The ability to run periodically (every 15 seconds or 5 minutes, etc) would also be useful. This would also give a NOC the ability to troubleshoot a bit more or provide some additional context around an alert without everyone having to know a bunch of tools or have administrative access to a server.23Views1like7CommentsClear an alert with a NaN value
I recently wrote a datasource that pulled an API and alerted when the return value was greater than 0 The problem I ran into is the API never returned a 0, instead it would return NaN. I worked around this issue by using Key = Value datapoints and a "if (strv.isEmpty) {" statement. Basically, if their is a value returned the output in the script will be "events=[returned value]" the same as most key=value datapoints. If the returned value is empty, the script will fill out the entire string returning "events=0" which puts a 0 in the datapoint and allows the alert to clear. This a nice workaround for a LogicMonitor Admin's bag of tricks. //Print KeyValue strv = response_obj['results']['2']; if ( strv.isEmpty() ) { println "events=0" } else { println "events=" + strv; } return(0);22Views0likes1CommentAlert Tuning for DataSource that has "Automatically Delete Instance" enabled?
I have a version of the "Oracle_DB_BlockedSessions" datasource template deployed and set an alert threshold on a complex datapoint that accounts for WAIT_TIME and SECONDS_IN_WAIT. Here is the complex datapoint expression for those curious--- if( eq(if(un(WAIT_TIME),0,WAIT_TIME), 0), if(un(SECONDS_IN_WAIT_RAW),0,SECONDS_IN_WAIT_RAW), 0) If the complex datapoint has a value over 300 seconds, an alert triggers with all the enriched instance-level autoProps from the Active Discovery script. All other aspects of this template mirror the gold-standard version--including enabling the "Automatically Delete Instance" option. Enter Client X, and they are comfortable with a threshold of 900 seconds. How can I set this custom threshold at a resource group for Client X when they don't currently have any blocking sessions? If I do manage to catch and set this Alert Tuning customization when Client X has a blocking session, will this alert tuning get wiped out when the DSIs are removed automatically? I suppose the Active Discovery script could be modified to always output a dummy instance... but that leaves an unpleasant taste in my mouth. Aside from cloning the datasource just for Client X, are there any other alternatives? And no, I do not want to alert off of the "Oracle_DB_BlockedSessionOverview" template because a it doesn't do a good job of discerning between one really long blocking session versus sequential and short-lived sessions that happen to exist at the time of the poll.20Views0likes3CommentsAPI - Add Instance Count as Datasource Property
I'm trying to clean up datasources that are in our account that do not have any instances associated with them and likely never will. Currently I have to do this manually by inspecting each datasource in the GUI. It would be really great if the datasource instance count was returned as a property. Even better would be if the instances and associated device ID's were returned as well, but for now I'd be happy with just the device/instance counts.19Views0likes4CommentsLM Portal Stats - Visibility into Child-Accounts
LM Exchange code: ZJ49L9 1) Add child-account (or any LM portal) as device: childaccount.logicmonitor.com 2) Set device properties with API creds of that child-account / portal 3) Monitor stats of that portal: dead+alive devices, alert metrics, service-checks, cloud instances, users, etc.15Views0likes3Comments