Forum Discussion

eleaman's avatar
4 months ago
Solved

Does anyone have any experience with monitoring Windows Processes?

I’ve checked the community for datasources and I don’t see anything to what I’m specifically looking for.  Our organization currently utilizes the Microsoft_Windows_Services datasource (modified a little bit for our specific needs) to monitor services.  I’m looking for something similar to monitor windows processes.

Similar to the Microsoft_Windows_Services datasource, what I am hoping to accomplish is provide a list of keywords that will either match or be contained in the process name that I want to monitor, provide a list of machines that I want to monitor those processes on, and then get alerted on if those processes stop running.  Some issues I am running into so far are:

  1. Win32_Process always returns a value of NULL for status and state. So I cannot monitor for those two class level properties.
  2. Powershell’s Get-Process does not return status or state, rather it just looks for processes that are actively running, so I would need to get creative in having LogicMonitor create the instance and what value to monitor in the instance.
  3. Some of the processes I want to monitor create multiple processes with the same name, and LogicMonitor then groups them all together into one instance, which makes monitoring diffucult.
  4. Some of the process I want to monitor are processes that only run if an application is manually launched, which means that again I will need to get creative in how I set up monitoring because I don’t want to get alerts when a process that I know shouldn’t be running is not running. 

Because the processes I am trying to monitor are not going to be common for everyone everywhere, something that other people could do to try to replicate my scenario would be:

Open Chrome.  When Chrome is launched, you will get a processed called “Chrome”.  Now, open several other tabs of Chrome, you will just get more processes named “Chrome”.  Now, keeping in mind the points I made earlier, set up monitoring to let you know when the 3rd tab in Chrome has been closed, even though the rest of the Chrome tabs are still open.  How would you break that down?  My first thought would be to monitor the PIDs, however, when you reboot your machine, your PIDs will likely change.  Also, I don’t want to have the datasource wild value search by PID, because that would get confusing really fast once you have 2 or 3 different PIDs that you want to monitor. 

All suggestions are welcome, and any help is greatly appreciated.  Bonus points if you can get this to work with the discovery method as Script and you use an embedded Groovy or Powershell script. 

  • There are a couple of issues you’ve brought up, so I’ll answer the most straightforward ones first:

    Usually if you want to do something like this, it’s a good idea to plan to clone and modify the default datasource; generally you’ll want to use active discovery instead of the “add other monitoring” workflow that the default datasource uses, which requires the end user to manually manage the instance lists. 

    There are a few ways to use patterns in active discovery to build instance lists across different resources. A common pattern which will allow you to use one datasource for a number of different processes is to store matching expressions in a property, then assign that property to either resources or resource groups (which allows for inheritance) On this page. there’s an example where the wmi attribute “name” has a regexmatch filter “store|mad|.*exchange.*” which will require a processes name to match that regex pattern. The approach I’m talking about would have you create a resource property like importantWinProcessMatch and then put a custom token “##importantWinProcessMatch##” into the value field. Then you would assign appropriate patterns to different based on what their target process lists should be.  For example, if you are running a custom application with a critical process on a group of machines, you could create a pattern for all the processes that should be monitored for those machines and assign it to that group. (A similar construct in a different context is how the default “Windows Security Event Log” event source uses a “##FILTEREDEVENTS## token to allow for different devices to have different filters while using the same logicmodule) It’s also possible to automate property assignment directly on individual resources if there’s some other source of truth available. This tokenized filter pattern works with any kind of discovery, although if you want to bring the pattern matching into a script, you will have to handle the pattern match in the script using the constructs of the script’s language. Of course, if you do it this way, the complexity of the script is limited mainly by your own imagination. 

    Another decision you will want to make is how many datasources you would like to end up with. Using tokenized and/or scripted discovery lets you potentially reuse the same datasource again and again across different populations of machines for different sets of processes. Alternatively, you could decide to make clones specifically to meet certain needs. I believe LM has produced examples of both, but here are some pros and cons for the approaches:

    • Reuse 
      • Fewer datasources to maintain (without scripts, the base datasource is unlikely to need maintenance, though)
      • All instances appear as one kind of thing in any graphs, reports, dashboards
      • Can cover more targets just by assigning more properties (no need to keep changing the datasources)
      • (con) The properties and their patterns can be complex; you’ll need to craft a property for each combination of processes to be monitored for each datasource and make sure that property is assigned to or inherited by all the devices
    • Targeted
      • More datasources
      • Instances are more specific to the target workload
      • Possibly easier to deal with if there are a lot of different intersections of sets of processes (no need to worry about AND-ing the match expressions)
      • Can more readily customize alert messages to the specific process patterns the datasources cover
      • Each function can have its own set of filters or scripts handling discovery, making them individually less complex and easier to understand and maintain

    Your intent to monitor process only once they’ve been started is similar to the way that we only monitor SNMP interfaces once they’ve been plugged in. The pattern we used: set up the discovery so that it runs regularly and frequently, and that the filters you’ve chosen won’t discover them before they are active, but set the discovery not to “automatically delete instances”, then they will be discovered, but then continue to be exist and be polled by the collector even after they’ve been shut down or disappeared.

    At the end of it all though, it appears that you have discovered that the classes that these datasources are built on seem to have some structural problems when it comes to monitoring. Specifically: the quality of the data they report is not everything one might wish for, and the naming and other metadata is not great either. Neither the names nor the id’s make good, unique, durable instance IDs which would support a reliable instance name. This is a limitation of the WMI class and probably one of the reasons you don’t see more extensive use of process monitoring in the provided datasources. To use your example: I can’t tell you how to correlate the third tab of chrome (or most things running on windows) directly to a process name in a reliable fashion. (And it’s tough to generalize from the specific instances where it does work well). As with all things monitoring, there’s a trade off between very general approaches, like monitoring a list of processes, and getting more specific, like monitoring specific processes, or the specific programs that create them, and the right approach will depend on how much information you want, how important the systems and software are, and the amount of time and effort you’re inclined to put into it. 

    Some approaches you might also consider:

    • Monitoring the target applications directly: use datasources that connect to them and directly evaluate their function. A much more direct approach than process monitoring
    • LM Logs: set up LM Logs, look for undesirable process-related events, and set up pipeline alerts for those patterns
    • Event Sources: same approach as for LM Logs, but provides less context

19 Replies

  • @Mike Aracic Thank you for the very detailed response!  I had been working on a powershell script that gets all processes running that match a given name/keyword and then adds an object property to those processes that combines the process name and the PID of the process in the format “Name-PID”.  So for example i would run get-process for Chrome and then make an array of those results. Then each index of that array would get a “.InstanceName” property.  So chrome[0].InstanceName would be Chrome-1234, chrome[1].InstanceName would be Chrome-7890 or whatever the various PID would be for that specific process.  I may combine that script with the section here on WMI Instance Level Properties.  

    If that does not work I also like the suggest about LM Logs.  We currently have LM Logs partially set up, so maybe finishing that set up and utilizing that would be beneficial.  

    Overall I am defitely more interested in using 1 single datasource rather than having a new datasource for each process I’d like to monitor.  For service monitoring I monitor about 30+ services across a group of about 285 servers all from 1 datasource.  That’s generally the idea that i’m going for here too.  I just wish that process monitoring was as easy as service monitoring!

  • @Stuart Weenig Correct, the PID would be unchanged for the lifetime of the process.  But once the process ends (ie, the machine is rebooted, or the process is killed) if the process starts again it would be an entirely new PID.   When I do an active discovery from the datasource editing page I can see that I have 7 instances of Chrome… When I go to the resource page and look, there is just 1 instance.  If I closed out of 2 or 3 tabs of Chrome LogicMonitor would still think that the process is running, when the specific PID that I want to monitor is in fact not running.  Getting LogicMonitor to recognize that Chrome1 and Chrome2 are two different things is the part I am having trouble with.  Aside from somehow incorporating the PID, I do not see an alternative option. 

    I also don’t want to discover the process by name but then have the instance be just the PID because when the alerts come out in an email I don’t want to get an email that says “Process ID #### has stopped”  I would like to know what that means in a user-friendly way.  If it says “Process ID ‘Chrome-####’” has stopped, at least I would know that one of the Chrome processes that I was monitoring stopped.  My only reason for that is that IF down the road I am monitoring multiple processes on any given machine, I would know which one actually died. 

    Do you have any powershell experience or suggestions that would help me find a solution for this?

  • @Stuart Weenig Also, because this process only shows up when the application I am trying to monitor is manually run, there unfortunately isn’t a service I can use to monitor this. 

  • Ah, display name and identifier are two different things.

    You’d use the PID as the identifier (wildvalue) and the process name as the display name (wildalias).

    As for instances disappearing, that depends on your auto-delete settings. You can set the instances to stick around for 30 days after they no longer appear in the discovery output. However, the instance would be hidden with the historical data being retained in case it ever came back.

  • The way I’ve done this with services in our systems is to use win32_service as a class to match the names to the PIDs and gather the status, state, & startmode, then use win32_process using that PID to get resource consumption data.  That allows output of all of the pieces and parts as a single DS for our application components we’re monitoring.  multi-instance Batch for application service sets that have more than one component to reduce task load on the collectors/network.

  • Why not just discover the instances based on a property containing the variables? One DS to rule them all.

  • The way I’ve done this with services in our systems is to use win32_service as a class to match the names to the PIDs and gather the status, state, & startmode, then use win32_process using that PID to get resource consumption data.  That allows output of all of the pieces and parts as a single DS for our application components we’re monitoring.  multi-instance Batch for application service sets that have more than one component to reduce task load on the collectors/network.

    Do you have this published in the exchange? or would you mind sharing your PS script so I can use it for my own environment?

  • I convert the status, state, and startmode into integers and the names I’m searching for are indicated using variables at the top of the powershell script, to make a new DS for a different app, clone, then change the content of the include and exclude variable arrays up top… so deploying the first DS takes an hour or two of coding and tweaking, but the second deployment takes minutes.

  • Why not just discover the instances based on a property containing the variables? One DS to rule them all.

    Ooh… fancy!