Windows Drive Space Alerts
Windows Drive Space Alerts By default, LogicMonitor alerts on the percentage used on any drive. This in general is fine, but sometimes not. Let’s imagine you have a 2.2 terabytes drive. You might have your critical threshold set at 90%, which sounds fine, until you realise that you are going to get a critical alert when you still have 220 GB free. In my case that would be a cause for some celebration, not really an urgent need to get up at 3 A.M. and delete files so the world doesn’t end. Now Imagine your 2.2TB drive is divided up as: C: 10 GB (OS) D: 500 GB (Mission critical applications) E: 1 TB (Backups) F: 510 GB (Other Applications) A 90% alert will give you a critical at 1GB,50GB,100GB and 51GB respectively. Now the C: drive may be a cause for concern, but the others not so much. The two application drives you might only be concerned if they have less than 4GB free and the backup less than 10GB. So, we decide to alert on the following C: freespace is <1 GB D: freespace is <4 GB E: freespace is <10 GB F: freespace is <4 GB You could clone the datasource so you have four copies one for each drive but this is harder to maintain in the future and does not scale well. It would be better if you could somehow get the drive letter and assign a threshold based on that. Logicmonitor’s scripted complex datapoint using groovy to the rescue. The disks datasource queries the class Win32_Volume. We need to use the raw drive letter output from the WMI class so would write a groovy script like: Drive=output["DRIVELETTER"]; return(Drive); This returns C:,D:,E: and F: Not much use as Logicmonitor doesn’t deal with text, only metrics. Let’s beef up the script. drive = output['DRIVELETTER']; freeSpaceLowerLimitGigabyte = '0'; if (drive == 'C:') {freeSpaceLowerLimitGigabyte = '1';} if (drive == 'D:' || drive == 'F:') {freeSpaceLowerLimitGigabyte = '4';} if (drive == 'E:') {freeSpaceLowerLimitGigabyte = '10';} return freeSpaceLowerLimitGigabyte; This returns 1,4,10 and 4 for each drive, now we have a complex datapoint that returns the lowerlimit in GB for each drive dependant on the drive letter. Again, we can’t alert on this so we need another datapoint So we can use this to check if freespace is less than the freeSpaceLowerLimitGigabyte. To do that create a CapacityAlert datapoint using this expression if ( lt (FreeSpace, FreeSpaceLowerLimitGigabyte * 1024000000) , 1, 0) Which breaks down as if freespace is less than the assigned limit for that drive letter then return 1 (which you alert on.) Otherwise return 0. Alert threshold set at = 1 1 1, and we get critical alerts if: C: freespace is <1 GB D: freespace is <4 GB E: freespace is <10 GB F: freespace is <4 GB28Views0likes11CommentsLive Training - Tuning Datapoints and Alerts - 15th JUNE 2022 - APAC
Hi all , Thanks for attending ourLive Training - Tuning Datapoints and Alerts - 15th JUNE 2022 - APAC region . Please view the video recording : Please do complete the feedback form here ;https://docs.google.com/forms/d/e/1FAIpQLScPWW5DzNxe2W5ieh6PjamLYWcP5AhDbUl1E3U7ZKryEgwEoA/viewform14Views0likes0CommentsAlert Count in Big Number widget
Hi I'm pretty new to LM and am struggling with the big number widget. I have a need to show alert counts for a specificsubscription, showing new (unacknowledged/cleared) alerts and then show some history i.e.unacknowledged/cleared over last 7 days, current month etc. Any guidance appreciated5Views0likes2CommentsIssues With Creating A Datasource
I took a working groovy script datasource and am now trying to adjust it to some needs we have. This data will end up giving us alert totals for each month so we can build reports. Any ideas? Here is what I have so far. import javax.crypto.Mac; import javax.crypto.spec.SecretKeySpec; import org.apache.commons.codec.binary.Hex; import groovy.json.JsonSlurper; //define credentials and url def accessId = hostProps.get('lmaccess.id'); def accessKey = hostProps.get('lmaccess.key'); def account = hostProps.get('lmaccount'); def alertgroup = hostProps.get('lmaccess.group'); def collectionFailures = 0 def failures = [:] def client = new LogicMonitorRestClient(accessId, accessKey, account, this.&println) try { def alerts = client.get("/device/groups/" + alertgroup + "/alerts", fields: "severity", filter: "startEpoch>:1538370000,endEpoch<:1541048399,cleared:*") //warnings = alerts.findAll {it.severity == 2}.size() println "WarningCount: ${alerts.findAll {it.severity == 2}.size()}" println "ErrorCount: ${alerts.findAll { it.severity == 3 }.size()}" println "CriticalCount: ${alerts.findAll { it.severity == 4 }.size()}" println "TotalAlerts: ${alerts.size()}" } catch (Throwable e) { failures["alerts"] = e.toString() collectionFailures += 1 } // Do error reporting println "CollectionFailures:${collectionFailures}" failures.each{ query, exception -> println "Exception while querying $query:" println exception } return 0 ////////////////////// // HELPER FUNCTIONS // ////////////////////// class LogicMonitorRestClient { String userKey String userId String account int maxPages = 20 int itemsPerPage = 1000 def println LogicMonitorRestClient(userId, userKey, account, printFunction) { this.userId = userId this.userKey = userKey this.account = account this.println = printFunction } def generateHeaders(verb, path) { def headers = [:] def epoch = System.currentTimeMillis() def requestVars = verb + epoch + path // Calculate signature def hmac = Mac.getInstance('HmacSHA256') def secret = new SecretKeySpec(userKey.getBytes(), 'HmacSHA256') hmac.init(secret) // Sign the request def hmac_signed = Hex.encodeHexString(hmac.doFinal(requestVars.getBytes())) def signature = hmac_signed.bytes.encodeBase64() headers["Authorization"] = "LMv1 " + userId + ":" + signature + ":" + epoch headers["Content-Type"] = "application/json" return headers } def packParams(params) { def pairs = [] params.each{ k, v -> pairs << ("${k}=${v}")} return pairs.join("&") } // Non paginating, raw version of the get function def _rawGet(path, params) { def baseUrl = 'https://' + account + '.logicmonitor.com' + '/santaba/rest' + path def packedParams = "" if(params) { packedParams = "?"+packParams(params) } def query = baseUrl+packedParams def url = query.toURL() def response = url.getText(useCaches: true, allowUserInteraction: false, requestProperties: generateHeaders("GET", path)) return response } // Public interface for getting stuff. def get(Map args=[:], path) { def itemsReceived = [] def pageReads = 0 // Impose our own paging parameters. args.size = itemsPerPage args.offset = 0 while(true) { // Do da nastieh def response = new JsonSlurper().parseText(_rawGet(path, args)) if (response.errmsg == "OK") { // Catch individual items if (response.data.items == null) { return response.data } itemsReceived += response.data.items // Check if there are more items // if (response.data.total > itemsReceived.size()) // { args.offset = args.size + args.offset // } // else // { // break // we are done // } } else { // Throw an exception with whatever error message we got. throw new Exception(response.errmsg) } pageReads += 1 // Check that we don't exceed max pages. if (pageReads >= maxPages) { break } if (response.data.total > 0) { break } } return itemsReceived } } If I run the URL with the API creds in my test powershell script, it works perfectly. When I test it in LM as a datasource, I get the attached error. Quote Exception while querying alerts: java.io.IOException: Server returned HTTP response code: 400 for URL https://XXX.logicmonitor.com/santaba/rest/device/groups/224/alerts?fields=severity&filter=startEpoch>:1538370000,endEpoch<:1541048399,cleared:*Solved5Views0likes5CommentsAdd alert timeframe to include days of the week
A feature enhancement that enables alerts to be limited to certain days of the week as well as hours/mins would be very beneficial as there are often occasions when an alert is needed in the working week but not at the weekend. An example is NetApp snapmirror lagtime. Mon-Sat these are set to replicate but not on a Sunday. We look for 24 hour lag most of the time to see an issue but on a Monday this would be 48 hours (as there would have been no snapmirror since the Sat). I appreciate I can create ways to manage alerts using time based escalations however there is no way to affect the alerts view on the dashboard with this approach. Hopefully something that other might also want which can be added in the future?5Views1like3CommentsClearing Alerts Manually
Hey guys! So I wanted to bring up the idea of clearing alerts manually. I searched the feature requests threads and haven't really found an answer or a thread that matched what I was looking for so I thought I would take a shot at doing one of these. Apologies in advance if this has been discussed already.. Or if I don't make much sense. I'm fairly new to using the platform so I might not be fully up to speed with all the lingo. So let me explain a bit of what brought me to this request.. I have set up monitoring on our virtual machines to monitor CPU usage by percentage (x\100). I then have an alert setup to indicate a stuck process which would shoot out an alert if a data point hasn't changed (+/-3%) on the next 3 intervals (which is set to 3 minutes). The alert clears if it changes after the next 4 intervals. The process above has been working great so far but I quickly realized that we didnt really care about anything stuck between 0-50%.. we only wanted to focus on values that were stuck at 50% or above. I then changed the valid value range to be between 5000-10000 (50-100%) which produced a lot more productive results. I did notice that CPU's which did end up being stuck within the 50-100% range, then clear to a value outside of the valid value range (X<50) then this would produce NO DATA thus having the initial alert stay in limbo forever. You could manually clear them by going to the device and toggle alerting on the device off and on again.. but doing that for a large amount of alerts takes a lot of time. I'm okay with the way I have it set up (but I do believe that the above may be a bug..) I just kind wished we could manually clear alerts from the alerting window without having to take extra steps. Maybe something next to the acknowledge button? I might have jumbled this up so please ask if I need to clarify any of the information above. I can provide screenshots if needed as well. Thanks for taking the time to read this! TL;DR = Let us manually clear alerts from the alert window without having to go into specific devices and toggle alerting.4Views1like1CommentDisable Alerts on Active Discovery for specific instances
Hello, I recently had a case where we were trying to find out if there was a way to disable alerting on specific instances during active discovery. Currently we have instances that are discovered via snmp that we do not want alerts to be enabled since some networks we are monitoring are test networks. To turn these alerts off, we have to manually find the instances and turn off alerts from there. For LogicMonitor Support, the case number was54822.4Views0likes2CommentsCreate Alert Tokens for Acknowledgement By, and Acknowledgement Comment
Extending alert information from LogicMonitor to other 3rd Party systems is pretty common for us, however, the available tokens today to describe the alert is missing a few bits of data (we feel). It would be incredibly helpful to have an alert token that contains the LM User responsible forAcknowledging the alert, and a separate token for the Ack comment. Having these tokens allows us to better map alerting details to upstream and downstream integrations.3Views0likes0CommentsCustom recommendation link page for each alert threshold definition
Per discussion with Jeff Woeber, I want to submit this as feature request in LogicMonitor end as each alert threshold within each datasource(e.g. Tomcat ThreadPool- ) can have its own wiki troubleshooting page. It’s be a great feature if LogicMonitor enables user to specify it’s own troubleshooting page as optional field for each datasource. Usercan customize specific wiki page as recommendation whenever an alert is sent to PagerDuty.3Views0likes5Comments