ContributionsMost RecentMost LikesSolutionsRe: Tips or Tweaks for controlling when a daily ConfigSource runs? The old-fashioned way to do this would be to run it using cron (wrapped in a job monitor wrapper) You wouldn't have the diff (unless you saved the diff-able output in a file with its own configsource) but it would happen when you want it to. https://www.logicmonitor.com/support/logicmodules/batchjobs/setting-up-batchjob-monitoring/batchjobs Re: Letting non-admin users make and manage their own dynamic groups Whatever the organizing principle of your grouping structure, it could probably be used to assign devices into static groups using the API, and the API tokens a user can have are bound to respect the roles. Re: SNMP collector performance: SHA/AES vs MD5/DES I don’t actually know which would be more demanding offhand. The simplest way to test it would be to pick a resource or small set of resources and make the change, then measure what we call the ival and see if it’s significantly different. Full explanation below: Collector capacity for SNMP is primarily constrained by the availability of collector SNMP worker threads. A collector’s workload for SNMP polling consists of those instances which use snmp as a collection type attached to the resources assigned to that collector times the polls (inverse of the polling interval). Each of these instances will create a recurring task on the collector that will be scheduled at the proper interval. Waiting to serve these tasks are a pool of dedicated worker threads. When a task’s scheduled time occurs (as dictated by the interval), it will request a thread from the pool. If a thread is available, the task will run on the thread, making the request from the collector to the SNMP agent on the target resource, it will wait for the response, and when that response comes, it will process the data and release the thread back into the pool. If a collector has enough tasks running that all are busy when a scheduled task makes the request, the task will wait for a little bit for a thread to become available. If some amount of time passes and no thread becomes available, the scheduler will give up on the task (and by then, the task’s next scheduled run is likely imminent in any case) At that point the collector is working beyond its capacity, as task executions are being requested, but there are no idle threads in the pool to execute the tasks, so data is not being collected. Because the collector’s scheduler does the best that it can, it’s normal at this point to see gaps in graphs with some successful executions sprinkled in. There are a few collector datasources that can help show when this is happening. The LogicMonitor_Collector_DataCollectingTasks has a datapoint UnavailableScheduleTaskRate which by default will trigger a warning when tasks aren’t getting scheduled properly. It’s a good one to keep an eye out for, but once it’s there, you’re already losing data because of capacity constraints. From a workload perspective, the longer the responses take, the more time each will occupy a thread, so switching the encryption, if it slows things down, could certainly have an impact. (Although having a non-responsive resources is going to use much, much more thread time as the threads will all wait for an entire timeout interval; a typical SNMP resonse time is in single digit MS, while a timeout is going to take many thousands of MS) As you can imagine, there are a lot of variables that go into this. For your testing though, the easiest way to see how the change goes is to use the collector debug facility (only available to administrators with the proper role) to see how long the execution tends to take on a resource or a few resources, make your changes, wait for a new set of polls, and then check again. !tlist with filter for SNMP and resource Here’s a screenshot of a lab collector’s response to a !tlist command with filter arguments for SNMP collection and a particular resource. The execution time and status are the two rightmost columns. (Note that SNMP is fast ~ 2 ms here) There are also constraints where it comes to memory and CPU, but you’re less likely to see those directly. All of these constraints can be managed easily by changing the collector size. Re: Domain joined collector polling non-domain joined device What’s the service account that the collector and watchdog are running under? What do you get when you run the !account command in the collector debug? Is that the same one you’re using when you use wbemtest? Re: Is it possible to set up a password rotation policy It should certainly be possible to do so, using the REST API through the SDK You would programmatically make an authenticated GET Request against the /setting/admins/ endpoint (All encapsulated within the SDK) to get the list of the admins using the SDK’s get_admin_list method Iterate through the list, using whatever logic you wish to determine which accounts would be set to force a password change (I don’t think we expose the age of the password; I suppose you could just do them all every 90 days) Then Programmatically create a PATCH request through the sdk against the /settings/admins/{id} endpoint using the SDK’s patch_admin_by_id method to update the force_password_change field for the appropriate accounts. If you’d rather not use the SDK, the REST API swagger docs are here, but the process will be the same. Re: Get historical values for alerting As a possible alternative, have you considered setting dynamic thresholds on the datapoint? They work by having the platform learn the normal range for a given datapoint (per instance) and alerting if the measured values depart a certain amount from that normal range. They do rely on historic data to learn what’s normal, but they do not let you define in absolute terms how far from normal triggers an alert. They are good for things like database file size, where there’s no obvious number or fraction to look for. https://www.logicmonitor.com/support/alerts/aiops-features-for-alerting/enabling-dynamic-thresholds-for-datapoints Re: Resource Property Filters on the New UI Preview The simplest way to understand the way resource property filters work is to think of them working much like the role-based-access control does for devices. When you set up a dashboard, all the widgets have all their own filters which determine what data is going to be shown. (Those filters can be either hard-coded in the widget settings or, usually preferably, driven by dashboard tokens) What the end-user actually sees, however, is limited by the groups of resources (and websites) that they have view access to through the “users and roles” assignments associated with their portal user account. Aa a resuly, you and I might see the data from different subsets of resources on the same dashboard. For example, our top 10 CPU % graph on a custom graph widget might have different members because you can see some resources I can’t see, and vice-versa. The resource property filters use the same kind of logic to change the data on a dashboard by limiting the view to a subset of resources that you can see; they must both pass the RBAC check and that match the expression in the resource property filters. If you can see it, and it matches the property filters, then the data will then be put through the same filters as the widgets on the dashboard. The idea is that this gives a quick way to use resource properties to explore data without having to set up every possible resource group. For example, you might have a lot of AWS tags, too many to bother to set up dynamic groups for all of them, but sometimes want to see a view of just one of them on a particular dashboard. If you want them more to be more permanently attached to the dashboard, you can set up specific resource groups or include them directly in the widget filters. (Although I agree, it would be nice to have a way to get some commonly used ones quickly) It’s not designed to filter based on the time-series data contained in the datapoints (The top % options in the custom graph widgets can be useful for this kind of viewing), but I have seen users use a PropertySource to periodically evaluate a resource’s datapoint data and put their result in a device property, usually to allow for data-based dynamic grouping, but it could also work for this, so that might be worth exploring, depending on what you’re trying to do. Re: Does anyone have any experience with monitoring Windows Processes? There are a couple of issues you’ve brought up, so I’ll answer the most straightforward ones first: Usually if you want to do something like this, it’s a good idea to plan to clone and modify the default datasource; generally you’ll want to use active discovery instead of the “add other monitoring” workflow that the default datasource uses, which requires the end user to manually manage the instance lists. There are a few ways to use patterns in active discovery to build instance lists across different resources. A common pattern which will allow you to use one datasource for a number of different processes is to store matching expressions in a property, then assign that property to either resources or resource groups (which allows for inheritance) On this page. there’s an example where the wmi attribute “name” has a regexmatch filter “store|mad|.*exchange.*” which will require a processes name to match that regex pattern. The approach I’m talking about would have you create a resource property like importantWinProcessMatch and then put a custom token “##importantWinProcessMatch##” into the value field. Then you would assign appropriate patterns to different based on what their target process lists should be. For example, if you are running a custom application with a critical process on a group of machines, you could create a pattern for all the processes that should be monitored for those machines and assign it to that group. (A similar construct in a different context is how the default “Windows Security Event Log” event source uses a “##FILTEREDEVENTS## token to allow for different devices to have different filters while using the same logicmodule) It’s also possible to automate property assignment directly on individual resources if there’s some other source of truth available. This tokenized filter pattern works with any kind of discovery, although if you want to bring the pattern matching into a script, you will have to handle the pattern match in the script using the constructs of the script’s language. Of course, if you do it this way, the complexity of the script is limited mainly by your own imagination. Another decision you will want to make is how many datasources you would like to end up with. Using tokenized and/or scripted discovery lets you potentially reuse the same datasource again and again across different populations of machines for different sets of processes. Alternatively, you could decide to make clones specifically to meet certain needs. I believe LM has produced examples of both, but here are some pros and cons for the approaches: Reuse Fewer datasources to maintain (without scripts, the base datasource is unlikely to need maintenance, though) All instances appear as one kind of thing in any graphs, reports, dashboards Can cover more targets just by assigning more properties (no need to keep changing the datasources) (con) The properties and their patterns can be complex; you’ll need to craft a property for each combination of processes to be monitored for each datasource and make sure that property is assigned to or inherited by all the devices Targeted More datasources Instances are more specific to the target workload Possibly easier to deal with if there are a lot of different intersections of sets of processes (no need to worry about AND-ing the match expressions) Can more readily customize alert messages to the specific process patterns the datasources cover Each function can have its own set of filters or scripts handling discovery, making them individually less complex and easier to understand and maintain Your intent to monitor process only once they’ve been started is similar to the way that we only monitor SNMP interfaces once they’ve been plugged in. The pattern we used: set up the discovery so that it runs regularly and frequently, and that the filters you’ve chosen won’t discover them before they are active, but set the discovery not to “automatically delete instances”, then they will be discovered, but then continue to be exist and be polled by the collector even after they’ve been shut down or disappeared. At the end of it all though, it appears that you have discovered that the classes that these datasources are built on seem to have some structural problems when it comes to monitoring. Specifically: the quality of the data they report is not everything one might wish for, and the naming and other metadata is not great either. Neither the names nor the id’s make good, unique, durable instance IDs which would support a reliable instance name. This is a limitation of the WMI class and probably one of the reasons you don’t see more extensive use of process monitoring in the provided datasources. To use your example: I can’t tell you how to correlate the third tab of chrome (or most things running on windows) directly to a process name in a reliable fashion. (And it’s tough to generalize from the specific instances where it does work well). As with all things monitoring, there’s a trade off between very general approaches, like monitoring a list of processes, and getting more specific, like monitoring specific processes, or the specific programs that create them, and the right approach will depend on how much information you want, how important the systems and software are, and the amount of time and effort you’re inclined to put into it. Some approaches you might also consider: Monitoring the target applications directly: use datasources that connect to them and directly evaluate their function. A much more direct approach than process monitoring LM Logs: set up LM Logs, look for undesirable process-related events, and set up pipeline alerts for those patterns Event Sources: same approach as for LM Logs, but provides less context Re: Any way to pull available upgrade collector version via API,SDK or any other means. @Stuart Weenig I believe mandatory doesn’t mean supported, instead I believe it refers to the policy where an update will eventually be forced by the platform. If a “mandatory” version exists and the collector is running an older version, it will eventually self-schedule an update. The mandatory updates run a month after the release, which is probably a reason the releaseepoch is there. Looking at the data, I think @Naveenkumar is correct about the labels (but I might not build any application logic on it, without confirming) Re: Azure Stack HCI resources don't have storage, memory, disk or cluster metrics I don’t have one of these to experiment with, but it might be worth looking at this article. (https://www.logicmonitor.com/support/monitoring/applications-databases/windows-server-failover-cluster-on-sql-server-monitoring) If the azure HCI cluster uses the same kind of cluster mechanics as Windows Server Failover Clusters, there are some system properties here which might allow for the application of WSFC logicmodules. I don’t know if they will work, but it’s worth checking out. (If it doesn’t work, just remove the properties)
Top ContributionsRaspberry Pi Collector AlternativeRe: SNMP collector performance: SHA/AES vs MD5/DESRe: Resource Property Filters on the New UI PreviewRe: Does anyone have any experience with monitoring Windows Processes?Re: AD Server Monitoring Best PracticesRe: Is it possible to set up a password rotation policyRe: Ability to exclude a device(s)/device groups from a Dashboard widget?Re: Cylance Offline ModeRe: What collector settings control # of threads for these threads: esx-service, snmp-selector, webpage-async-workersRe: Tips or Tweaks for controlling when a daily ConfigSource runs?