Forum Discussion

Coleman's avatar
4 months ago

Top windows processes by bandwith usage

I’m looking for a datasource that’ll monitor ‘top 10 windows processes by bandwidth” similar to how netflow data does it. Can anyone recommend an existing datasource? 

Thanks!

9 Replies

  • Clarify what your goal is? How will you be using that data? Do you want to trend it or just see it whenever there’s a problem? (@LM please give us actions) Are you looking to track specific processes or just the top consumers?

    Process monitoring is incredibly difficult because processes are ephemeral. Services are better candidates for monitoring. Unless you just want snapshots of what the top processes are, in that case, you could write a configsource (not recommended).

    See this conversation:

    https://community.logicmonitor.com/lm-exchange-29/does-anyone-have-any-experience-with-monitoring-windows-processes-4020
  • I’m looking to monitor the top 10 consumer of bandwidth at any given time over the previous 24 hours. Overnight, we’re seeing large spikes in bandwidth consumption that last for 10-30mins but are unable to identify which process is the culprit. 

    We’ve been monitoring suspected services individually but was hoping for other options. 

    Thank you

  • This is the perfect use case for incident responses (sometimes dreamed to be called LM Actions) which is not something LM can do yet (why not LM?!). You really don’t need it all the time, just whenever there are spikes. So, you really only need it for when the CPU triggers an alert; you’d want to know what the top processes are at that moment. If LM had the ability to kick off an action in response to an alert, this would be the perfect case.

    But it can’t so you’re stuck with possible workarounds: 

    1. Create a configsource that grabs the top processes as text and store that text. The problem with this is that configsources can run, at most frequent, every hour. Being able to hit the broadside of a barn does not a marksman make.
    2. Create a cronjob/scheduled task to dump the data into LM Logs. This could even be built into a datasource. With this, you’d be able to gather data up to every minute with a datasource, as often as you want with a cronjob/scheduled task. The downside is that you’re consuming LM Logs GBs. You could log it locally to the filesystem and not use LM at all. You could google/gpt the best powershell commands to run to get this output.
    3. You could submit a feature request for LM Actions and wait for LM to build it (bet).
  • oh wait, by bandwidth? i thought by CPU%. So netflow tells you the bandwidth, but you’re not able to tie the flows back to the process originating the flow? Or are you not yet using netflow?

  • We have good netflow data on our Cisco devices but we aren’t capturing NetFlow from our Windows Servers. Can this be done without 3rd party software? If not, 3rd party tools are an option, I’ll just have to go through onboarding. 

  • Are your windows servers not going through cisco devices where you can capture netflow?

  • They are but this leads to a different issue that our Network team is addressing with LM Support.

    Our network device (Cisco ASA hosting VPN tunnels) shows the spike in “IPSec tunnel throughput” but that data does not appear in NetFlow.

    For example, if we monitor live we can see the local server generating 100Mbps of SQL replication traffic (p1433) and the IPSec tunnel throughput will also show 100Mbps, but NetFlow at that time only shows 1-3Mbps of p1433 traffic.  

    It’s a long story but up until a few months ago, NetFlow and throughput on this device were close to 1:1. 

    So while they work on this, I’m looking for the same data but directly from the server. 

  • Ah, so the netflow is capturing the encrypted/tunnelled traffic instead of the raw traffic. Thus the culprit’s identity is lumped in with all other ipsec traffic making it impossible to know the actual source.

    No chance to turn on netflow at some point in the path before or after the ipsec tunnel?