Can someone explain the CPU 5 minute load average thing to me?
So, We have a server currently alerting us because the 5MinLoadPerCore field is > 1. I'm trying to understand why that is. I found this page that says if that number is >1, it means there are things queued up waiting for the CPU and there's a backlog. https://www.logicmonitor.com/blog/what-the-heck-is-cpu-load-on-a-linux-machine-and-why-do-i-care However, the server in question has the CPUs currently running at around 60%. I would think that if it were backed up, it should be cranking at 99% trying to catch up. I would think the 5minload alert and a CPU Usage percentage alert would come as a pair, but they don't. Just trying to figure out if there's anything that can be done when we get the 5Min alerts or if they're more just informational and can be ignored. They only come in as a Warning anyway, so if it's just informational, then it's just noise, and maybe we'll just turn them off. Just looking for other opinions. ;) Thanks.101Views2likes3CommentsHow do you handle Disk Space alerting?
Hi, The LM standard is to do disk space alerting based on the percentage of space used. E.g. 90% warning, 95% error, 98% critical (or whatever). This is great for machines with an average size drive, but is completely useless for machine with giant drives. We have some machines that might have drives that are multiple Terabytes in size. Getting an alert when a 2TB drive is 90% full doesn't help because it still has 200 Gigs free. For all our Windows servers, we've added additional alerting based on a hard coded 10 Gigs free on the C drive because Windows Updates generally have issues if you have less than that. This has helped a bunch for smaller drives where 10 Gigs free isn't small enough to hit the Percent-based alerting because maybe it's only a 60 Gig C drive. I'm just wondering what everyone else does for these types of alerts. Do you use the percentage based alerts for most machines but then create new ones, or change the percentage for large or small servers? Do you change everything to a hard size limit? Some other combination? Just looking for ideas so we can try to reduce the unnecessary alerts for servers with huge drives and get alerts for ones with tiny drives. Thanks.31Views3likes1CommentFeature Request - Alerting Protection Prompt
I would like to submit a recommendation for a feature request please. This would benefit a lot off users I believe. I work for a large managed service provider with many customers in LM. While checking my customers today I noticed the top level resource grouping for one of them had Alerting switched OFF. I asked an LM Admin to check who had set this and when to discover after further investigation an engineer had mistakenly clicked this 5 days ago. Given how easy this click to alter status and the potential impact it has is, I recommend if this is altered a pop-up prompt to confirm you are sure is flagged as well as potentially capturing a note as to a reason.56Views14likes5CommentsFeature Request - New property to indicate monitoring is disabled
I have a problem where my users will disable alerting on a resource that has been removed from the environment instead of deleting it. Or they may disable alerting and forget to re-enable it. With alerting disabled, my no-data datapoint alerts never trigger. In the case of removed devices, the resources just hang out there forever and never get removed because they don’t show up on my data collection failures dashboard. Devices for which the user intended to re-enable alerting eventually meet the same fate unless they go down and we don’t know about it. Either of these outcomes is bad news. It would be nice if there were a property which would indicate that alerting for a resource has been disabled. As far as I am aware, there is currently no way to do this. Then I could create a dynamic group of devices which have monitoring disabled. As devices are populated in the group, I can investigate them. As it stands now, I have to go manually hunt them down.145Views20likes3CommentsNew UI Enhancements/ Feature Request
Being able to select several alerts at once and from the Actions menu have disable/enable alerting available, preferable as a cycle alerting option so that it disables and enables automatically, that we can allow anyone to perform. As we often set SDT and need to clear the alerts in CSM, in order to achieve this we select disable then enable on each alert individually when there is over 100 alerts this can be a pain.109Views5likes4CommentsNetflow Alerting Rules
Not only restricting you to visualising the Netflow data on LM Platform. Interestingly, the most recent improvement to LogicMonitor Netflow is the Traffic Alerting Rules. It is possible to set up traffic alert rules for the NetFlow resources to get alerts when a resource's traffic hits a specific threshold, drops off for a specified length of time, etc. Traffic Alerting Rules feature are available and you can create rules at: Traffic Alert Rule at Group Level Traffic Alert Rule at Resource Level Don't miss out on the advantages of this feature and refer the below link for more details. https://www.logicmonitor.com/support/traffic-alert-rule107Views18likes0CommentsThreshold Duration
Hello, I was wondering if there was a possibility to impose a duration constraint on a threshold in LogicMonitor... I see where you can enable dynamic alters but was not sure if they would look back to the duration of the alert rather than just a floating data point that it would attempt to normalize. Thanks in AdvanceSolved27Views0likes1CommentDisabled Alerts Notes
When there is a legitimate reason for disabling alerts for a device, it would be very useful to be able to leave a note as to why (and by whom). This would prevent confusion with teams, where the case of "why would this be disabled" would come up frequently. For example, there is a known bug with a certain version combination of ESXi and HPE servers that triggers a false-positive hardware alert internally, so we disable alerts for that instance on servers that meet the criteria as we encounter them. Or, some QNAPs will give false-positive alerts that their disk is full when in fact it is "full" due to a RAIN configured as a LUN (we thus rely on the server alerting when the iSCSI volume is actually full). However, another technician may log in and flip alerting for these instances back on, assuming it was a mistake or something, and then we would get flooded with these false-positive alerts, prompting technicians to look into them; as you can see, this causes a loop of wasted time. Simply putting a note associated with the "Alerting Off / On" switch and tagging it with the user invoking it would easily solve issues like this. Something like what is shown for Acknowledgements would be adequate. Perhaps even an admin option to require a note or not?13Views6likes1CommentCluster Alert Routing
It would be immensely helpful if I could see and test alert routing from the Cluster Alerts page at the device group level similar to the existing Alert Routing button on the Alert Tuning tab. As we begin to more heavily utilize this functionality, it's critical that we can verify that alerts are routed correctly wherever we set it up.5Views1like0CommentsComplex Datapoints between Datasources
It would be great to create alerts from multiple data-points from multiple data-sources. For example if CPU is above 30% and SQL database lock timeouts is above 1000. I can see many uses cases to be able to alert on different datapoints that relate to other datapoints in other data-sources.8Views1like1Comment