Alert Troubleshooting 101
One of the most common support cases we face every day is 'why am I receiving this alert', this article would explain to you the steps on how to determine why are you receiving the alerts. 1) Understand the alert received 2)Checking on validity via raw data and threshold 3)Checking on delivery 1) Understanding the alert received The first step when you receive an alert either via email, text or via any ticketing system is to understand the alert. Understand an alert is to look at which device is the alert for, which datapoint and value of the alert. For example in an email alert message, it would appear as per below. LogicMonitor Alert: Host: ##HOST## Host Group: ##GROUP## Datasource: ##DATASOURCE## Datapoint: ##DATAPOINT## Description: ##DSIDESCRIPTION## Value: ##VALUE## Level: ##LEVEL## Start: ##START## Duration: ##DURATION## Reason: ##DATAPOINT## ##THRESHOLD## ##ALERTID## 2) Checking on validity via raw data and threshold Next, once you determined the alert source, you need to understand why this alert is triggered. This can be done by first looking at the threshold that is set for that particular datapoint. After checking the threshold you can go to the raw data tab of the datapoint to check if it meets the threshold being sent. For example In this case, a critical alert was received and a threshold of 80 90 95 and an alert will only be triggered if you have 20 consecutive polls that fall within this range. Now the next step would be to check on the RAW DATA tab to determine if this condition was met. Judging from the raw data above if you look at the values all the 20 polls have met the threshold level of 80 90 95, but to determine the level of the alert it would be the last poll since the last poll was 96.67 will falls to the range of a critical alert thus a critical alert was send. 3) Checking on delivery The last process is to check the alert rule and escalation chain to see if it was applied to the correct rule and escalation chain. To do so you can go the alert tuning tab and check on the alert routing for that particular instance and datapoint. Here you can see that the Alert Rule applied is Critical - Default and the Alert Chain/Escalation Chain is Critical - Default. Under the Alert Chain is the list of email address that will receive a notification, when the threshold is met.26Views0likes0CommentsToken to include DataSource raw output in email and alert body
We have script DataSources that output useful diagnostics information that help Operations to understand the number value when an alert is generated. We want to include the raw output from a DataSource in the alert and email body. What we need is a ##DSRAWOUTPUT## token which contains the complete raw output sent to standard out from a DataSource script. For example, we monitor for processes running under credentials they are no supposed to be running under, and we want to include that info as textual information in the alert/email body.23Views3likes2CommentsAd-hoc script running
Often when an alert pops up, I find myself running some very common troubleshooting/helpful tools to quickly gather more info. It would be nice to get that info quickly and easily without having to go to other tools when an alert occurs. For example - right now, when we get a high cpu alert the first thing I do is run pslist -s \\computername (PSTools are so awesome) and psloggedon \\computername to see who's logged in at the moment. I know it's possible to create a datasource to discover all active processes, and retrieve CPU/memory/disk metrics specific to a given process, but processes on a given server might change pretty frequently so you'd have to run active discovery frequently. It just doesn't seem like the best way and most of the time I don't care what's running on the server and only need to know "in the moment." A way to run a script via a button for a given datasource would be a really cool feature. Maybe on the datasource you could add a feature to hold a "gather additional data" or meta-data script, the script could then be invoked manually on an alert or datasource instance. IE when an alert occurs, you can click on a button in the alert called "gather additional data" or something which would run the script and produce a small box or window with the output. The ability to run periodically (every 15 seconds or 5 minutes, etc) would also be useful. This would also give a NOC the ability to troubleshoot a bit more or provide some additional context around an alert without everyone having to know a bunch of tools or have administrative access to a server.23Views1like7CommentsCustom alert messages per Cluster
I'm coming around to love clustered alerts as more of my company moves to dynamic environments. But I really need to be able to customize the email alert messaging for clustered alerts. So I would like to see two things: 1. The ability to set a custom alert message per clustered alert 2. The ability to assign properties to clustered alerts so that they can be referenced in the alert message via ##TOKENS##.12Views1like1CommentAlert Test Report
I started a chat under ticket 119191 and discussed this with Seth. I would like you to consider this for your next roadmap. I want to be able to see what alerts "would fire" without enabling the alerts. Scenario: Onboarding 10 new devices to a new group with alerting disabled. I want to QUICKLY see how many would fire if I enabled them. No hunting, no slowly turning each one up one by one to prevent the new alert deluge. Maybe a report with applied thresholds and current values with clear indicators what alert level the value is within at the report runtime.9Views0likes3CommentsAlerts on Longer Periods within Datasources
For a datasource, we would like to be able to set the alert threshold over more than a single sample. You can set the number of threshold violations needed for an alert, but this is far different in nature than setting a threshold over a time range. For example, 60% CPU over 2 hours versus 60% CPU over 10 samples. You might see CPU fluctuate within that period, preventing an alert, but the average over a longer period is valuable. Similarly, we would like to get alerts not just on average over a time period, but also on slope over a time period, though perhaps the latter should be a separate request. Thanks, Mark5Views0likes1CommentAlerts rule continuance
I would like to have the option within an alert rule to "continue" processing to the next rule. For example, we would like to handle integrations differently than email alerts. I we could create one rule at the top with the highest priority to take an action with our integration, then allow me to customize everything else in separate rules. The only other way to handle this is to add our integration to every escalation chain we create, which is tedious and will lead to manual errors.4Views0likes0CommentsAlert Groups
Hi, We get quite a few alerts in our LM, about 90% of them are made up of a few problems, like Vmware storage luns. It would be useful to be able to group these, to get rid of the 'noise' calls so we can focus on the more urgent ones. What would be even better would be if you could assign permissions to these groups. For instance, the first line can see certain groups and third line can see other groups. Kris4Views0likes0CommentsAlert Test Report
I started a chat under ticket 119191 and discussed this with Seth. I would like you to consider this for your next roadmap. I want to be able to see what alerts "would fire" without enabling the alerts. Scenario: Onboarding 10 new devices to a new group with alerting disabled. I want to QUICKLY see how many would fire if I enabled them. No hunting, no slowly turning each one up one by one to prevent the new alert deluge. Maybe a report with applied thresholds and current values with clear indicators what alert level the value is within at the report runtime.3Views0likes0Comments