Live Training - Tuning Datapoints and Alerts - 15th JUNE 2022 - APAC
Hi all , Thanks for attending ourLive Training - Tuning Datapoints and Alerts - 15th JUNE 2022 - APAC region . Please view the video recording : Please do complete the feedback form here ;https://docs.google.com/forms/d/e/1FAIpQLScPWW5DzNxe2W5ieh6PjamLYWcP5AhDbUl1E3U7ZKryEgwEoA/viewform17Views0likes0CommentsModify alert notes en mass.
The ability to modify alert notes en mass the same way you can acknowledge multiple alerts at once would be a nice thing to have. When multiple (20+) alerts have a incorrect note put into them it is time consuming to go back and manually fix them one by one. I see the in new UI you can mass tag them but when you go to modify the note it tells you their already acknowledged. Thanks!1View0likes0CommentsWeb Page notifications
I noticed that the communities.logicmonitor.com requests rights to "show notifications". It would be awesome if our LogicMonitor instances could do the same and that we could configure in settings, so that we could push style notifications to our workstations on alerts.0Views0likes0CommentsAdd alert timeframe to include days of the week
A feature enhancement that enables alerts to be limited to certain days of the week as well as hours/mins would be very beneficial as there are often occasions when an alert is needed in the working week but not at the weekend. An example is NetApp snapmirror lagtime. Mon-Sat these are set to replicate but not on a Sunday. We look for 24 hour lag most of the time to see an issue but on a Monday this would be 48 hours (as there would have been no snapmirror since the Sat). I appreciate I can create ways to manage alerts using time based escalations however there is no way to affect the alerts view on the dashboard with this approach. Hopefully something that other might also want which can be added in the future?6Views1like3CommentsInclude JDBC exception messages in Query Status alerts
For JDBC datasources, please create a token that would enable us to include the JDBC driver exception message in the alert for Query Status data point alerts, the ones that are based on: Query status - 1=ok, 2=credential invalid, 3=connection string invalid, 4=connection rejected, 5=driver not supported, 6=connection failure, 7=query failure This would greatly help us to achieve faster time to resolution of incidents when the exception is code of type 6 and 7.0Views0likes0CommentsWindows Drive Space Alerts
Windows Drive Space Alerts By default, LogicMonitor alerts on the percentage used on any drive. This in general is fine, but sometimes not. Let’s imagine you have a 2.2 terabytes drive. You might have your critical threshold set at 90%, which sounds fine, until you realise that you are going to get a critical alert when you still have 220 GB free. In my case that would be a cause for some celebration, not really an urgent need to get up at 3 A.M. and delete files so the world doesn’t end. Now Imagine your 2.2TB drive is divided up as: C: 10 GB (OS) D: 500 GB (Mission critical applications) E: 1 TB (Backups) F: 510 GB (Other Applications) A 90% alert will give you a critical at 1GB,50GB,100GB and 51GB respectively. Now the C: drive may be a cause for concern, but the others not so much. The two application drives you might only be concerned if they have less than 4GB free and the backup less than 10GB. So, we decide to alert on the following C: freespace is <1 GB D: freespace is <4 GB E: freespace is <10 GB F: freespace is <4 GB You could clone the datasource so you have four copies one for each drive but this is harder to maintain in the future and does not scale well. It would be better if you could somehow get the drive letter and assign a threshold based on that. Logicmonitor’s scripted complex datapoint using groovy to the rescue. The disks datasource queries the class Win32_Volume. We need to use the raw drive letter output from the WMI class so would write a groovy script like: Drive=output["DRIVELETTER"]; return(Drive); This returns C:,D:,E: and F: Not much use as Logicmonitor doesn’t deal with text, only metrics. Let’s beef up the script. drive = output['DRIVELETTER']; freeSpaceLowerLimitGigabyte = '0'; if (drive == 'C:') {freeSpaceLowerLimitGigabyte = '1';} if (drive == 'D:' || drive == 'F:') {freeSpaceLowerLimitGigabyte = '4';} if (drive == 'E:') {freeSpaceLowerLimitGigabyte = '10';} return freeSpaceLowerLimitGigabyte; This returns 1,4,10 and 4 for each drive, now we have a complex datapoint that returns the lowerlimit in GB for each drive dependant on the drive letter. Again, we can’t alert on this so we need another datapoint So we can use this to check if freespace is less than the freeSpaceLowerLimitGigabyte. To do that create a CapacityAlert datapoint using this expression if ( lt (FreeSpace, FreeSpaceLowerLimitGigabyte * 1024000000) , 1, 0) Which breaks down as if freespace is less than the assigned limit for that drive letter then return 1 (which you alert on.) Otherwise return 0. Alert threshold set at = 1 1 1, and we get critical alerts if: C: freespace is <1 GB D: freespace is <4 GB E: freespace is <10 GB F: freespace is <4 GB76Views0likes11CommentsAlert Count in Big Number widget
Hi I'm pretty new to LM and am struggling with the big number widget. I have a need to show alert counts for a specificsubscription, showing new (unacknowledged/cleared) alerts and then show some history i.e.unacknowledged/cleared over last 7 days, current month etc. Any guidance appreciated22Views0likes2CommentsAcknowledged date for a repeating alert condition still shows the original acknowledged date
We've been seeing an issue where we get a critical alert, we are notified through our escalation chains, and we acknowledge the alert. However, the action we take to resolve the alert is only enoughdrop the severity on the alert to error or warning, not clear it entirely. If that alert crosses a critical threshold again it will show up as acknowledged from the first time it went critical, which will prevent all notification. For example we have threshold for percent used on a volume at >=90 95 98. The volume hits 98%, we are notified and ack the alert, but are only able to clear space to drop the volume down to 92%. If that volume hits 98% again it will show up as already acknowledged and prevents all notifications (see below): This is the expected behavioraccording to LM, but I don't see a benefit in this behavior and it seems risky if you expect to get alerted any time a threshold is crossed. We'd like to be able to receive anotification any time an alert crosses an threshold, regardless if it has been acknowledged at a higher severity for that alert "instance."2Views0likes7CommentsAlert clustering based on matching datasource instances across grouped devices
I have a device group that has the same datasource applied. This datasource auto-discovers and will spin up matching instances across all devices in the group. I would like to have clustered alerts based on the matched instances across all devices in the group. For example, (pardon the ASCII-like visualization) ClusterGroup |__ Device1 | |__ DatasourceA | |_ Instance_ABC | |_ Datapoint_I | |_ Datapoint_II | |_ Instance_DEF | |_ Datapoint_I | |_ Datapoint_II | |_ Instance_GHI | |_ Datapoint_I | |_ Datapoint_II |__ Device2 | |__ DatasourceA | |_ Instance_ABC | |_ Datapoint_I | |_ Datapoint_II | |_ Instance_DEF | |_ Datapoint_I | |_ Datapoint_II | |_ Instance_GHI | |_ Datapoint_I | |_ Datapoint_II |__ Device3 |__ DatasourceA |_ Instance_ABC |_ Datapoint_I |_ Datapoint_II |_ Instance_DEF |_ Datapoint_I |_ Datapoint_II |_ Instance_GHI |_ Datapoint_I |_ Datapoint_II If Instance_ABC's Datapoint_I is alerting at the specified cluster threshold in my hypothetical group, I want to generate a cluster alert. If some time afterwards, the situation in my environment gets worse and Instance_GHI's Datapoint_II is alerting at the specified cluster threshold, I want another cluster alert for that instance-datapoint as well.4Views0likes1CommentIssues With Creating A Datasource
I took a working groovy script datasource and am now trying to adjust it to some needs we have. This data will end up giving us alert totals for each month so we can build reports. Any ideas? Here is what I have so far. import javax.crypto.Mac; import javax.crypto.spec.SecretKeySpec; import org.apache.commons.codec.binary.Hex; import groovy.json.JsonSlurper; //define credentials and url def accessId = hostProps.get('lmaccess.id'); def accessKey = hostProps.get('lmaccess.key'); def account = hostProps.get('lmaccount'); def alertgroup = hostProps.get('lmaccess.group'); def collectionFailures = 0 def failures = [:] def client = new LogicMonitorRestClient(accessId, accessKey, account, this.&println) try { def alerts = client.get("/device/groups/" + alertgroup + "/alerts", fields: "severity", filter: "startEpoch>:1538370000,endEpoch<:1541048399,cleared:*") //warnings = alerts.findAll {it.severity == 2}.size() println "WarningCount: ${alerts.findAll {it.severity == 2}.size()}" println "ErrorCount: ${alerts.findAll { it.severity == 3 }.size()}" println "CriticalCount: ${alerts.findAll { it.severity == 4 }.size()}" println "TotalAlerts: ${alerts.size()}" } catch (Throwable e) { failures["alerts"] = e.toString() collectionFailures += 1 } // Do error reporting println "CollectionFailures:${collectionFailures}" failures.each{ query, exception -> println "Exception while querying $query:" println exception } return 0 ////////////////////// // HELPER FUNCTIONS // ////////////////////// class LogicMonitorRestClient { String userKey String userId String account int maxPages = 20 int itemsPerPage = 1000 def println LogicMonitorRestClient(userId, userKey, account, printFunction) { this.userId = userId this.userKey = userKey this.account = account this.println = printFunction } def generateHeaders(verb, path) { def headers = [:] def epoch = System.currentTimeMillis() def requestVars = verb + epoch + path // Calculate signature def hmac = Mac.getInstance('HmacSHA256') def secret = new SecretKeySpec(userKey.getBytes(), 'HmacSHA256') hmac.init(secret) // Sign the request def hmac_signed = Hex.encodeHexString(hmac.doFinal(requestVars.getBytes())) def signature = hmac_signed.bytes.encodeBase64() headers["Authorization"] = "LMv1 " + userId + ":" + signature + ":" + epoch headers["Content-Type"] = "application/json" return headers } def packParams(params) { def pairs = [] params.each{ k, v -> pairs << ("${k}=${v}")} return pairs.join("&") } // Non paginating, raw version of the get function def _rawGet(path, params) { def baseUrl = 'https://' + account + '.logicmonitor.com' + '/santaba/rest' + path def packedParams = "" if(params) { packedParams = "?"+packParams(params) } def query = baseUrl+packedParams def url = query.toURL() def response = url.getText(useCaches: true, allowUserInteraction: false, requestProperties: generateHeaders("GET", path)) return response } // Public interface for getting stuff. def get(Map args=[:], path) { def itemsReceived = [] def pageReads = 0 // Impose our own paging parameters. args.size = itemsPerPage args.offset = 0 while(true) { // Do da nastieh def response = new JsonSlurper().parseText(_rawGet(path, args)) if (response.errmsg == "OK") { // Catch individual items if (response.data.items == null) { return response.data } itemsReceived += response.data.items // Check if there are more items // if (response.data.total > itemsReceived.size()) // { args.offset = args.size + args.offset // } // else // { // break // we are done // } } else { // Throw an exception with whatever error message we got. throw new Exception(response.errmsg) } pageReads += 1 // Check that we don't exceed max pages. if (pageReads >= maxPages) { break } if (response.data.total > 0) { break } } return itemsReceived } } If I run the URL with the API creds in my test powershell script, it works perfectly. When I test it in LM as a datasource, I get the attached error. Quote Exception while querying alerts: java.io.IOException: Server returned HTTP response code: 400 for URL https://XXX.logicmonitor.com/santaba/rest/device/groups/224/alerts?fields=severity&filter=startEpoch>:1538370000,endEpoch<:1541048399,cleared:*Solved9Views0likes5Comments