Azure Stack HCI resources don't have storage, memory, disk or cluster metrics
We have a customer with an Azure Stack HCI cluster deployed a few months ago. For those not familiar, this is basically a customised Windows Server core environment that runs Hyper-V VMs and some Azure-specific workloads on-premises. The virtualised workloads are all added as resources using a locally-deployed collector (on the Windows jumphost, if it matters) and they all show CPU, Disks, Interfaces, Processes …everything you’d expect for Windows hosts. We’ve added the two nodes as Resources, but we don’t see any detailed metrics - only Host Status (DNS), HTTP and Ping. I also added the FQDN for the cluster management point / VNN, and it has the same minimal detail as the individual cluster nodes. There are quite a few valid/correct properties recorded for the systems - some, for example: system.domain (customer AD domain) system.ips (all IP addresses for all interfaces) system.model (correctly identifies vendor and server model, presumably from WMI) system.sysinfo (“Microsoft Azure Stack HCI”) system.sysname (hostname) system.systemtype (“x64-based PC”) Is there something else I need to do to have this system monitored? I’m rushing because we nearly had a CSV run out of space - we thought it was monitored, and we were wrong.Solved324Views16likes9CommentsWindows Services Monitoring with quite a bit more Automation applied
So today we use LM's Microsoft Windows ServicesDataSource to monitor Windows Services. This DS uses Groovy Script and WMI calls under the hood to fetch the service metrics like state, start mode, status, etc... Everything works fine but one of the prerequisites is to go and manually populate the list of Windows services which then the DS parses out as a WILDVALUE variable in the script. You know, go to the device, click on Down Arrow (Manage Resource Options) --> Add Additional Monitoring --> and CHOOSE from the list of Windows Services. Rinse and Repeat and Save. Then the DS goes to work. Well, what if you have a list of over 100 Windows Services you need to add to let's say 20 Windows devices? That would take forever to populate that list manually... That's a problem number 1. Scratch that. This is not really a problem since one can run a PowerShell script (or Groovy Script) to perform this task using undocumented - but working very well - LM API calls. That problem is solved. Next - This list of over 100 Services needs to be *refreshed* every let's say 24 hours to remove nonexistent services and add new ones based on the Regex filter. That's a problem number 2. And again, one can do it programmatically running API calls but this is where I am trying to figure out how to do it. Run my script as a custom PropertySource? I am not really writing Resource Properties, I am updating instance list (Windows Services) within Additional Monitoring on bunch of Resources. Plus PropertySources are applied when ActiveDiscovery is run which is what, every 24 hours? Or should I write custom DataSource that would accomplish this refresh and specify 1 day collection period? Thanks.Solved745Views4likes2CommentsProcess Monitoring Batch Script
s there a way we can measure the performance of a Data Source or collectors? Repository:ProcessMonitoring @Stuart Weenig I presume I did not understand why monitoring lots of processes/services on Windows systems, with _Select Data Sources might not be the best approach. Aren’t both making aWMI call? Aren’t both going to bring all the Processes in one go? Can we seethe query count from WMI Vs Batch Groovy?Solved135Views0likes7CommentsProcess Monitoring
Hi @Stuart Weenig Thank you for your awesome work! I was able to use the Win_Process_Stats_Groovy.xmlfile for creating data source for Process. https://github.com/sweenig/lm/tree/main/ProcessMonitoring I am able to see data in Discovery and Collector but under Raw Data in Devices > Data sourceI do not see any data , when I poll I do see data, am I missing something. My Applied To Wizard has the following query I removed the Win_Process_Stats.excludeRegEx &Win_Process_Stats.includeRegEx from “AppliesTo” isWindows() && system.displayname == "server001" or system.displayname == "server001"Solved229Views8likes10CommentsWindows System Event Log "message" details not accurate
We are using the defaultWindows System Event Log event source and having those errors route through a Teams integration. When tested fromWindows System Event Log event source the Event Logging displays the entire “message” detailing the eventID reason etc etc. When looking in the Alerts section of the GUI it also shows the entire “Message” section with details. However when the alert shows up in Teams its dumbed down and useless. We get the following. Message: error - HOSTNAME Windows System Event Log The Teams integration is setup identically to the Event Source Alert message as seen below. Anyone know why ##Message## is getting overwritten with useless info instead of the actual message details from the Event? Host: ##HOST## Eventsource: ##EVENTSOURCE## Windows Event ID: ##EVENTCODE## Message: ##MESSAGE## Detected on: ##START##72Views12likes7Commentssystem.info missing in the Info on a device
Hello, I have the system.info =~ “Integrated Lights-Out 4 255 Aug 1 6 2017” on 3 machines but it is missing on 2 other machines. What doI miss in the process to get the system.info appearing on the info for a device and get populated ? I checked the SNMP Services on each machine and the Security Tab is identical with the “Accepted community names” as well as the list of IPs in the “Accept SNMP packets from the hosts”. What did I miss? Thanks, DomSolved271Views2likes2CommentsDatasource to monitor Windows Services/Processes automatically?
Hello, We recently cloned 2 Logic Monitor out of the box datasources (name ->WinService- & WinProcessStats-) in order to enable the 'Active Discovery' feature on those. We did this becausewe've the need to discover services/processesautomatically, since we don't have an 'exact list' of which services/processes we should monitor (due to the amount of clients [+100] & the different services/solutions across them) After enabling this it works fine & does what we expect (discovers all the services/processes running in each box),we further added some filters in the active discovery for the servicesin order to exclude common 'noisy' services & grab only the ones set to automatically start with the system. Our problem arrives when these 2specific datasourcestartto impact the collector performance (due to the huge amount of wmi.queries), it starts to reflect on a huge consumption of CPU(putting thaton almost 100% usage all the time) & that further leads to the decrease of the collector performance & data collection (resulting in request timeouts & full WMI queues). We also thought on creating 2 datasources(services/processes) for each client (with filters to grab critical/wanted processes/services for the client in question) but that's a nightmare(specially when you've clients installing applications without any notice & expecting us to automatically grab & monitor those). Example of 1 of our scenarios (1of our clients): - Collector is a Windows VM (VMWare)&has 8GB of RAM with4 allocated virtual processors (host processor is a Intel Xeon E5-2698v3 @ 2.30Ghz) - Currently, it monitors 78 Windows servers (not including the collector) & those 2datasourceare creating 12 700 instances (4513 - services | 8187 - processes) - examples below This results in approx. 15 requests per second This results in approx. 45 requests per second According to the collector capacity document (ref. Medium Collector) we are below the limits (forWMI), however, those 2 datasourceare contributing A LOT to make the queues full. We're finding errors in a regular basis- example below To sum thisup, we were seeking for another 'way' of doing the same thing without consuming so much resources on the collector end (due to the amount of simultaneousWMI queries). Not sure if that's possible though. Did anyone had this need in the past & was able to come up with a differentsolution (not so resource exhaustive)? We're struggling here mainly because we come from a non-agent less solution (which didn't facedthis problem due to the individual agentdistributed load - per device). Appreciate the help in advance! Thanks,1.3KViews13likes37Comments