Windows Drive Space Alerts
Windows Drive Space Alerts By default, LogicMonitor alerts on the percentage used on any drive. This in general is fine, but sometimes not. Let’s imagine you have a 2.2 terabytes drive. You might have your critical threshold set at 90%, which sounds fine, until you realise that you are going to get a critical alert when you still have 220 GB free. In my case that would be a cause for some celebration, not really an urgent need to get up at 3 A.M. and delete files so the world doesn’t end. Now Imagine your 2.2TB drive is divided up as: C: 10 GB (OS) D: 500 GB (Mission critical applications) E: 1 TB (Backups) F: 510 GB (Other Applications) A 90% alert will give you a critical at 1GB,50GB,100GB and 51GB respectively. Now the C: drive may be a cause for concern, but the others not so much. The two application drives you might only be concerned if they have less than 4GB free and the backup less than 10GB. So, we decide to alert on the following C: freespace is <1 GB D: freespace is <4 GB E: freespace is <10 GB F: freespace is <4 GB You could clone the datasource so you have four copies one for each drive but this is harder to maintain in the future and does not scale well. It would be better if you could somehow get the drive letter and assign a threshold based on that. Logicmonitor’s scripted complex datapoint using groovy to the rescue. The disks datasource queries the class Win32_Volume. We need to use the raw drive letter output from the WMI class so would write a groovy script like: Drive=output["DRIVELETTER"]; return(Drive); This returns C:,D:,E: and F: Not much use as Logicmonitor doesn’t deal with text, only metrics. Let’s beef up the script. drive = output['DRIVELETTER']; freeSpaceLowerLimitGigabyte = '0'; if (drive == 'C:') {freeSpaceLowerLimitGigabyte = '1';} if (drive == 'D:' || drive == 'F:') {freeSpaceLowerLimitGigabyte = '4';} if (drive == 'E:') {freeSpaceLowerLimitGigabyte = '10';} return freeSpaceLowerLimitGigabyte; This returns 1,4,10 and 4 for each drive, now we have a complex datapoint that returns the lowerlimit in GB for each drive dependant on the drive letter. Again, we can’t alert on this so we need another datapoint So we can use this to check if freespace is less than the freeSpaceLowerLimitGigabyte. To do that create a CapacityAlert datapoint using this expression if ( lt (FreeSpace, FreeSpaceLowerLimitGigabyte * 1024000000) , 1, 0) Which breaks down as if freespace is less than the assigned limit for that drive letter then return 1 (which you alert on.) Otherwise return 0. Alert threshold set at = 1 1 1, and we get critical alerts if: C: freespace is <1 GB D: freespace is <4 GB E: freespace is <10 GB F: freespace is <4 GB100Views0likes11CommentsAlert Count in Big Number widget
Hi I'm pretty new to LM and am struggling with the big number widget. I have a need to show alert counts for a specificsubscription, showing new (unacknowledged/cleared) alerts and then show some history i.e.unacknowledged/cleared over last 7 days, current month etc. Any guidance appreciated37Views0likes2Commentsmasive statusflap alerts
Hi to all: I have problems with "StatusFlap" type alerts, the truth is that the platform send me many messages to my email, so, does anyone know if they are false positives? My network deviceshave no problems on the interfaces, how can i stop this? let me know please, kind regards Iván Martínez34Views0likes4CommentsHow to validate your Alert Rule & Escalation Chain
You must have set up your Alert Rules & Escalation Chains hoping that it is setup correctly. What if it was not set up accurately and it does not Alert the right group or even worse it does not alert at all? The worst thing is for you not to receive an alert when a device is down or let's say you have a disk which is filling up due to logs which have been set to a verbose mode which one of your teammates did not change the level back after troubleshooting. In this article, you will be guided how to setup an effective Alert Rule & Escalation chain. In addition, we will show you how to deliver a live alert without creating any impact to the system in question. Before diving into the troubleshooting steps, below are the difference between Alert Rules and Escalation Chains. Alert Rules are used to tag the respective Escalation Chains when a certain device reaches the defined severity level. You could define this Alert Rule to use an Escalation chain only when a certain data point is reached. Escalation Chains are used to set the delivery method for Alerts. This could be set to deliver your alerts via email, sms, ticketing systems, custom HTTP integrations, etc.You may also set your Escalation Chain to be routed to different groups of people during different times/days. This is useful for different sets of standby engineers for a 24x7 operation. Alert Rules & Escalation Chains are very powerful if used correctly. To begin, we will first create an Escalation Chain. For this example, i will create it for Windows devices. We recommend enabling rate limit as you will not want to receive a flood of alerts. By doing so, it limits the maximum number of Alerts delivered in the defined time. If you are wondering, i created 3 stages for different delivery methods (email, Hipchat & voice). The duration that it takes to move from one chain to the other is defined within the Escalation Interval of the Alert Rule. This is an optional section where we have the ability to route alerts to different people depending on the time and day. It is quite simple, just select the days & timing for the respective stages. This section below for the creation of Alert Rules requires good planning.Alerts are triggered based on on the priority level. It will start from the lowest to the highest number. It should start with the most granular to the most number of wildcards. A common use case is: Create an Alert rule to send Interface related Alerts to the network team Create an Alert rule to send hardware or performance Alerts to sysadmin team Create an Alert rule to send Exchange Alerts to the messaging team Create an Alert rule to send all other alerts to the sysadmin team Another essential portion which we need to focus on is the Group which it is applied to. We get this question asked countless times. It’s an easy fix but it is knowing what to fix. If you set it to * it will apply to all groups - which is great. However, we know that we can’t apply the Alert rule to all devices. We might need to apply different alert rules to a different type of devices (e.g: Server, Switches, Routers, WAN Links, etc). Let's say you have a router “wan01” which resides in the group “Infrastructure -> Critical -> Networking -> Routers -> WAN”. If you apply the Alert Rule to “Infrastructure/Critical/”, your device will not pick up this Alert Rule as it resides in subtree. The fix is simple, just apply the Alert Rule to “Infrastructure/Critical/*”. This will Apply to all subgroups under Critical. Now, once you have set that up, I'm sure you would like to verify if that if the Alert Rule is picked up by the datasource or instance in question. To do so, navigate to the datasource or instance in question. Click on the COG button and it will show you the Alert Rule, Escalation Chain and delivery method for each stage. This is how you can determine if your Alert Rule or Escalation chain is picked up. The next thing is to validate the delivery of an Alert. Yes, we could click on the “Send Test Alert”. I’m sure we prefer to have an actual alert to see how it works. My favourite datasource to use is the Ping datasource with the PingLossPercent datapoint. To trigger an alert, we could change this value to “>=0”. What this will do is to send an Alert when the Ping Loss is more than or equal to 0. To do so, it’s quite easy too.Click on the pencil icon within the line of PingLossPercent. Click on the + sign as this will create an instance level threshold. What you want to do is to set the value to 0 for critical. You should receive the Alerts quite soon after. Once you have received the alerts and verified its all working, remember to remove it as you dont want to get flooded with alerts. I hope this article has provided you with sufficient information on how to setup an alert, test and trigger the Alerts.33Views0likes0CommentsDisable Alerts on Active Discovery for specific instances
Hello, I recently had a case where we were trying to find out if there was a way to disable alerting on specific instances during active discovery. Currently we have instances that are discovered via snmp that we do not want alerts to be enabled since some networks we are monitoring are test networks. To turn these alerts off, we have to manually find the instances and turn off alerts from there. For LogicMonitor Support, the case number was54822.24Views0likes2CommentsLive Training - Tuning Datapoints and Alerts - 15th JUNE 2022 - APAC
Hi all , Thanks for attending ourLive Training - Tuning Datapoints and Alerts - 15th JUNE 2022 - APAC region . Please view the video recording : Please do complete the feedback form here ;https://docs.google.com/forms/d/e/1FAIpQLScPWW5DzNxe2W5ieh6PjamLYWcP5AhDbUl1E3U7ZKryEgwEoA/viewform21Views0likes0CommentsClearing Alerts Manually
Hey guys! So I wanted to bring up the idea of clearing alerts manually. I searched the feature requests threads and haven't really found an answer or a thread that matched what I was looking for so I thought I would take a shot at doing one of these. Apologies in advance if this has been discussed already.. Or if I don't make much sense. I'm fairly new to using the platform so I might not be fully up to speed with all the lingo. So let me explain a bit of what brought me to this request.. I have set up monitoring on our virtual machines to monitor CPU usage by percentage (x\100). I then have an alert setup to indicate a stuck process which would shoot out an alert if a data point hasn't changed (+/-3%) on the next 3 intervals (which is set to 3 minutes). The alert clears if it changes after the next 4 intervals. The process above has been working great so far but I quickly realized that we didnt really care about anything stuck between 0-50%.. we only wanted to focus on values that were stuck at 50% or above. I then changed the valid value range to be between 5000-10000 (50-100%) which produced a lot more productive results. I did notice that CPU's which did end up being stuck within the 50-100% range, then clear to a value outside of the valid value range (X<50) then this would produce NO DATA thus having the initial alert stay in limbo forever. You could manually clear them by going to the device and toggle alerting on the device off and on again.. but doing that for a large amount of alerts takes a lot of time. I'm okay with the way I have it set up (but I do believe that the above may be a bug..) I just kind wished we could manually clear alerts from the alerting window without having to take extra steps. Maybe something next to the acknowledge button? I might have jumbled this up so please ask if I need to clarify any of the information above. I can provide screenshots if needed as well. Thanks for taking the time to read this! TL;DR = Let us manually clear alerts from the alert window without having to go into specific devices and toggle alerting.20Views1like1CommentIssues With Creating A Datasource
I took a working groovy script datasource and am now trying to adjust it to some needs we have. This data will end up giving us alert totals for each month so we can build reports. Any ideas? Here is what I have so far. import javax.crypto.Mac; import javax.crypto.spec.SecretKeySpec; import org.apache.commons.codec.binary.Hex; import groovy.json.JsonSlurper; //define credentials and url def accessId = hostProps.get('lmaccess.id'); def accessKey = hostProps.get('lmaccess.key'); def account = hostProps.get('lmaccount'); def alertgroup = hostProps.get('lmaccess.group'); def collectionFailures = 0 def failures = [:] def client = new LogicMonitorRestClient(accessId, accessKey, account, this.&println) try { def alerts = client.get("/device/groups/" + alertgroup + "/alerts", fields: "severity", filter: "startEpoch>:1538370000,endEpoch<:1541048399,cleared:*") //warnings = alerts.findAll {it.severity == 2}.size() println "WarningCount: ${alerts.findAll {it.severity == 2}.size()}" println "ErrorCount: ${alerts.findAll { it.severity == 3 }.size()}" println "CriticalCount: ${alerts.findAll { it.severity == 4 }.size()}" println "TotalAlerts: ${alerts.size()}" } catch (Throwable e) { failures["alerts"] = e.toString() collectionFailures += 1 } // Do error reporting println "CollectionFailures:${collectionFailures}" failures.each{ query, exception -> println "Exception while querying $query:" println exception } return 0 ////////////////////// // HELPER FUNCTIONS // ////////////////////// class LogicMonitorRestClient { String userKey String userId String account int maxPages = 20 int itemsPerPage = 1000 def println LogicMonitorRestClient(userId, userKey, account, printFunction) { this.userId = userId this.userKey = userKey this.account = account this.println = printFunction } def generateHeaders(verb, path) { def headers = [:] def epoch = System.currentTimeMillis() def requestVars = verb + epoch + path // Calculate signature def hmac = Mac.getInstance('HmacSHA256') def secret = new SecretKeySpec(userKey.getBytes(), 'HmacSHA256') hmac.init(secret) // Sign the request def hmac_signed = Hex.encodeHexString(hmac.doFinal(requestVars.getBytes())) def signature = hmac_signed.bytes.encodeBase64() headers["Authorization"] = "LMv1 " + userId + ":" + signature + ":" + epoch headers["Content-Type"] = "application/json" return headers } def packParams(params) { def pairs = [] params.each{ k, v -> pairs << ("${k}=${v}")} return pairs.join("&") } // Non paginating, raw version of the get function def _rawGet(path, params) { def baseUrl = 'https://' + account + '.logicmonitor.com' + '/santaba/rest' + path def packedParams = "" if(params) { packedParams = "?"+packParams(params) } def query = baseUrl+packedParams def url = query.toURL() def response = url.getText(useCaches: true, allowUserInteraction: false, requestProperties: generateHeaders("GET", path)) return response } // Public interface for getting stuff. def get(Map args=[:], path) { def itemsReceived = [] def pageReads = 0 // Impose our own paging parameters. args.size = itemsPerPage args.offset = 0 while(true) { // Do da nastieh def response = new JsonSlurper().parseText(_rawGet(path, args)) if (response.errmsg == "OK") { // Catch individual items if (response.data.items == null) { return response.data } itemsReceived += response.data.items // Check if there are more items // if (response.data.total > itemsReceived.size()) // { args.offset = args.size + args.offset // } // else // { // break // we are done // } } else { // Throw an exception with whatever error message we got. throw new Exception(response.errmsg) } pageReads += 1 // Check that we don't exceed max pages. if (pageReads >= maxPages) { break } if (response.data.total > 0) { break } } return itemsReceived } } If I run the URL with the API creds in my test powershell script, it works perfectly. When I test it in LM as a datasource, I get the attached error. Quote Exception while querying alerts: java.io.IOException: Server returned HTTP response code: 400 for URL https://XXX.logicmonitor.com/santaba/rest/device/groups/224/alerts?fields=severity&filter=startEpoch>:1538370000,endEpoch<:1541048399,cleared:*Solved13Views0likes5CommentsLogicMonitor Integration with IFTTT (If This, Then That)
IFTTT is a free SaaS platform that helps you "do more with all your apps and devices" - by providing an integration point betweencommonly used services and platforms. In the following example, we're using the IFTTT Applet webhooks "trigger" to activate a Philips Hue wireless lighting "action" - blinking the lights of the connected Hue platform as a result ofa LogicMonitor alert! Other things you might be able to do with LogicMonitor alerts, through IFTTT (lots of untested possibilities!) : Change lighting colors based on alert status (red for new, green for cleared, etc.) Receive alert notificationsto connected systems like Skype,Twitter, Evernote, or Google. Play music on a connected Sonos system after triggering an alert. Turn on a connected Smart Plug like the Wemo from Belkin. The Finished Result The following tutorial assumes that you have an IFTTT account created and permissions to add an integration to your LogicMonitor account. Step 1: Log into your IFTTT account and create a new 'Applet' Step 2: Search for and choose the 'Webhooks' service. Step 3: Choose the 'Receive a Web Request' trigger. Step 4: Configure (and remember) the event name that will be recognized by the incoming webhook to trigger the event. Step 5: Configure the 'Action' that will be taken when this event is triggered in IFTTT - lots of intriguing possibilities! Step 6: Once you've added and configured the 'Action,' review the applet settings and click 'Finish' to save the Applet. Step 7: Select 'Services' from the account dropdown - we will be looking up the incoming webhook URL for our account so we know where to send our alerts. Step 8: Search for the 'Webhooks' service and select it to proceed. Step 9: Select the 'Documentation' link from the 'Webhooks' services page. Step 10: Copy the incoming Event trigger URL along with the key for your account. You will replace {event} in the URL with the one you configured above. Step 11: Moving to your LogicMonitor account, navigate to 'Settings -> Integrations' and add a new 'Custom HTTP Delivery' integration using the event name from Step 4 and theURL (with key) from Step 10 : Step 12: IFTTT allows you to include an (optional!) payload - which will show in the 'Activity Log' of the IFTTT Applet. Step 13: Test Alert Delivery and you should see output similar to below in the IFTTT Activity Log. Step 14:Save your integration, assign it to an Escalation Chain, and assign the Escalation Chain to an Alert Rule - and now we've configured a simple integration between LogicMonitor and IFTTT that could form the basis of a handful of interesting alert actions!13Views0likes1CommentCustom Alert-Groups for SDT
When we reboot a Server or a Application Set our NOC does not know all the Devices, Instances and/or services impacted so we get flooded with alerts for a known event. Example: I need to reboot device WebServer-xyz - TheServer, the Switch ports, Storage Sessions and HTTP/S Service are all monitoredin LM Like to be able to SDT just these items with one SDT, and not entire switches or devices. So be able to create an "Alert-Group"ie "WebServer-xyz" where you can then add Instances from multiple devices, entire device, Service, Instance Groups, Device Group aka any defined in LM. Then just Add one SDT to the Alert-Group aka one-stop-shopping.10Views0likes1Comment