collector

63 Topics

LogicMonitor Collector Ports to be used while monitoring end-user devices
Review a full list of protocols and ports required for monitoring User Activity. This post will provide information regarding the ports, protocols, use case & configuration settings if required that is been used in general, with respect to LM platform. Using the " <port>/<protocol> " format is a common and standardized way to indicate network ports along with the associated protocols. This format helps provide a clear and concise representation of the port and protocol being discussed below : Inbound communication : Port Protocol Use Case Configuration Setting 162 UDP SNMP traps received from target devices eventcollector.snmptrap.address 514 UDP Syslog messages received from target devices eventcollector.syslog.port 2055 UDP NetFlow data received from target devices netflow.ports 6343 UDP sFlow data received from target devices netflow.sflow.ports 7214 HTTP/ Proprietary Communication from custom JobMonitors to Collector service httpd.port 2056 UDP JFlow data received from target devices Outbound communication : Port Protocol Use Case Configuration Setting 443 HTTP/TLS Communication between the Collector and the LogicMonitor data center (port 443 must be permitted to access LogicMonitor’s public IP addresses; If your environment does not allow the Collector to directly connect with the LogicMonitor data centers, you can configure the Collector to communicate through a proxy.) N/A Other non-privileged SNMP, WMI, HTTP, SSH, JMX, etc. Communication between Collector and target resources assigned for monitoring N/A Internal communication : Port Protocol Use Case Configuration Setting 7211 Proprietary Communication between Watchdog and Collector services to OS Proxy service (sbwinproxy/sblinuxproxy) sbproxy.port 7212 Proprietary Communication from Watchdog service to Collector service agent.status.port 7213 Proprietary Communication from Collector service to Watchdog service watchdog.status.port 15003 Proprietary Communication between Collector service and its service wrapper N/A 15004 Proprietary Communication between Collector service and its service wrapper N/A Destination Ports : Port Protocol Use Case 135 TCP Port 135 is used for DCOM's initial communication and RPC (Remote Procedure Call) endpoint mapping.. DCOM often uses higher port numbers in the range of 49152 to 65535 for dynamically allocated ports 22 TCP TCP for SSH connections 80 UDP NetFlow data received from target devices 443 UDP sFlow data received from target devices 25 HTTP/ Proprietary Communication from custom JobMonitors to Collector service 161 UDP JFlow data received from target devices 1433 TCP/UDP TCP for Microsoft SQL 1434 TCP/UDP The protocol used by port 1434 depends on the application that is using the port. For example, SQL Server uses TCP for communication with clients, while the SQL Server Browser service uses UDP 1521 TCP/UDP TCP/UDP to listen for database connections from Oracle clients 3306 TCP/UDP TCP/UDP for MySQL 5432 TCP TCP for PostgreSQL 123 NTP Connection from the library to an external NTP server. 445 TCP Server Message Block (SMB) protocol over TCP/IP LM Collector's monitoring protocols support a number of other monitoring protocols that can be incorporated into this list based on your preferences. Our LM collector supports a number of different monitoring protocols, so we can add to this list as necessary. Hopefully, through these details shared above, we will be able to understand what ports/protocols are used in LM platform. Thanks!
Persie
2 years ago Place Tech Talk
8.2KViews
39likes
1Comment
Best Practices for Practitioners: Collector Management and Troubleshooting
Overview The LogicMonitor Collector is a critical Software as a Service (SaaS) component designed to collect performance metrics across diverse IT infrastructures. It provides a centralized, intelligent monitoring solution to gather data from hundreds of devices without requiring individual agent installations. By encrypting and securely transmitting data via SSL, the Collector offers a flexible approach to infrastructure monitoring that adapts to complex and diverse network environments. Key Principles Implement a strategic, unified approach to infrastructure monitoring that provides comprehensive visibility across diverse environments Ensure collectors are lightweight, efficient, and have minimal performance impact on monitored resources Maintain robust security through encrypted data transmission and carefully managed credential handling Design a monitoring infrastructure that can dynamically adjust to changing network and resource landscapes Regularly review, tune, and update collector configurations to maintain optimal monitoring performance Comprehensive Collector Management Collector Placement Strategies Strategic Location Install collectors within the same network segments as monitored resources Choose servers functioning as syslog or DNS servers for optimal placement Avoid monitoring across vast internet connections, firewalls, or NAT gateways Sizing Considerations Select appropriate collector size based on expected monitoring load Consider available memory and system resources Understand collector type limitations (e.g., Windows collectors can monitor both Windows and other devices, while Linux collectors are limited to devices) Network and Security Configuration Configure unrestricted monitoring protocols (SNMP, WMI, JDBC) Implement NTP synchronization for accurate time reporting Use proxy servers if direct internet connectivity is restricted Configure firewall rules to allow necessary collector communications Collector Groups Organize collectors logically: By physical location By customer (for MSPs) By environment (development, production, etc.) Utilize Auto-Balanced Collector Groups (ABCG) for dynamic device load sharing Version Management Schedule regular updates Choose appropriate release types (MGD, GD, EA) Maintain update history for tracking changes Use downgrade option if experiencing version-specific issues Logging and Troubleshooting Log Management Adjust log levels strategically: Trace: Most verbose (use sparingly) Debug: Detailed information for troubleshooting Info: Default logging level Warn/Error: Issue-specific logging Configure log file retention in wrapper.conf Send logs to LogicMonitor support when collaborating on complex issues Troubleshooting Specific Environments Linux Collectors Check Name Service Caching Daemon (NSCD) configuration Verify SELinux settings Use getenforce or sestatus to check SELinux status Temporarily set SELinux to Permissive mode for debugging Windows Collectors Ensure service account has "Log on as a service" rights Check local security policy settings Resolve Error 1069 (logon failure) by updating user rights Advanced Techniques Credential Management Integrate with Credential Vault solutions: CyberArk Vault Delinea Vault Use dual account configurations for credential rotation Collector Debug Facility Utilize the command-line interface for remote debugging Run debug commands to troubleshoot data collection issues Performance and Optimization Regularly monitor collector performance metrics Tune collector configuration based on monitoring load Disable Standalone Script Engine (SSE) if memory is constrained Implement proper log and temporary file management Maintenance Checklist ✅ Regularly update collectors ✅ Monitor performance metrics ✅ Review collector logs ✅ Validate data collection accuracy ✅ Test failover and redundancy configurations ✅ Manage Scheduled Down Time (SDT) during maintenance windows Conclusion Successful LogicMonitor Collector management is a dynamic process that requires strategic planning, continuous optimization, and a deep understanding of your specific infrastructure needs. The key to effective monitoring lies in strategically placing collectors, configuring them appropriately, and regularly reviewing their performance and configuration. By following these best practices, organizations can create a robust, adaptable monitoring strategy that provides comprehensive visibility into their IT ecosystem. Additional Resources Management and Maintenance: Viewing Collector Events Managing Collector Logs Adding SDT to Collector Adding Collector Group Collector Version Management Integrating with Credential Vault Integrating with CyberArk Vault for Single Account Integrating with CyberArk Vault for Dual Account Troubleshooting: Troubleshooting Linux Collectors Troubleshooting Windows Collectors Collector Debug Facility Restarting Collector
skydonnell
8 months ago Place Tech Talk
1.7KViews
7likes
3Comments
Making the Most of LM Collectors: March Product Power Hour Recap
Overview March’s Product Power Hour zoomed in on one of the most critical components of the LM platform: the Collector. This session was all about helping you harness the full potential of LM Collectors—from deployment tips to tuning best practices. Our product experts walked through how to scale monitoring, improve data collection, and boost reliability across complex environments. With hands-on guidance, live demos, and direct answers to your questions, this session was a must-watch for any practitioner looking to level up their collector game. Key Highlights ⭐ Collector Best Practices: Practical guidance on sizing, scaling, and redundancy planning to optimize performance and ensure data continuity. ⭐ Auto-Balancing & Failover: How to configure Collectors for high availability using Auto-Balancing and Collector Groups. ⭐ Troubleshooting Tools: Demo of new collector diagnostics and log features to help quickly isolate and resolve issues. ⭐ Collector Configuration Management: Centralized tools for easier updates and streamlined deployments at scale. Q&A Q: How many devices can a Collector handle? A: Depends on specs and polling intervals—monitor CPU, memory, and poll duration for optimization. Q: Best practice for Collector failover? A: Use Collector Groups and enable Auto-Balancing for seamless failover. Q: Are there plans to enhance Collector management? A: Yes! Improvements are coming to simplify deployment and updates at scale. Q: Can we get alerts on Collector health? A: You bet. Alert on collector status, queue depth, and failures to stay ahead of issues. Customer Call-outs ⭐ “This makes managing distributed environments so much easier.” ⭐ “Collector Group auto-balancing was a game changer for us.” ⭐ “Loving the detailed diagnostics—helps our ops team move fast.” What’s Next 🚀 1H Launch Webinar April 3 - Level Up Your IT Universe Get a front-row seat to LogicMonitor’s biggest innovations of the first half of the year. This session spotlights AI-powered observability, enhanced workflows, and powerful platform upgrades. ⚡Product Power Hour April 24 – 1H Launch + Logs Leveled Up! Dive into the latest updates across Logs, AI, and LM’s observability suite in addition to fast-paced demos and real-world use cases in this log-focused edition of Product Power Hour. 🌍 Elevate Community Conferences Our flagship in-person event series is back! Connect with peers, attend expert-led sessions, and get hands-on product experience. Elevate 2025 will showcase the latest innovations in AI-powered observability, empowering enterprises to optimize their modern data centers. Dallas – April 30 Sydney – May 29 London – June 25 Pre-Elevate LM Training | LIVE Already attending Elevate? Join us in Dallas and London a day early to earn your first Logs Badge through immersive, instructor-led training and hands-on labs. Perfect for new users or anyone looking to sharpen their logging skills. Register today! Dallas – April 29 London – June 24 🍽️ User Group Dinners Connect in person with other LM users in your city over dinner and real talk. Share wins, swap stories, and grow your network - RSVP here: London – April 1 Dallas – April 29 Chicago – May 6 Nashville – May 13 💻 Virtual User Groups AMER East – June 3 AMER West – June 5 EMEA – June 10 APAC – June 12 Additional Resources If you missed any part of the session or want to revisit the content, we’ve got you covered: Review the slide deck here Want to see the full session? Watch the recording below ⬇️
skydonnell
4 months ago Place Recaps and Recording
1.1KViews
2likes
0Comments
Best Practices for Practitioners: Collector Installation and Configuration
Overview The LogicMonitor Collector is a Software as a Service (SaaS) that collects the data required for IT infrastructure monitoring in the LM Envision platform. Installed on Linux and/or Windows servers, it gathers performance metrics from selected devices across an organization's IT stack, whether it’s on-prem, off-prem, or in the cloud, using standard monitoring protocols. Unlike traditional monitoring approaches, a single Collector can monitor hundreds of devices without requiring individual agent installations on each resource. The Collector's core strength lies in its proprietary built-in intelligence that automatically recognizes device types and applies pre-configured Modules that define precise monitoring parameters specified to that device or platform. By encrypting collected data and transmitting it securely to LogicMonitor's servers via SSL, the Collector provides a flexible and centralized approach to infrastructure monitoring. This unique design allows organizations to strategically place Collectors within their network, enabling comprehensive performance visibility while minimizing monitoring overhead and complexity, with its monitoring capacity adapting to the device or service complex resources and specific metrics being collected. Key Principles LogicMonitor Collector deployment is guided by principles of efficiency, scalability, and intelligent monitoring: Centralized SaaS monitoring through strategic collector placement Simplified device discovery and metric collection Minimal performance impact on monitored resources Secure, encrypted data transmission Using the LogicMonitor Collector Recommended for: Complex IT infrastructures with multiple network segments Organizations requiring comprehensive, centralized monitoring Environments with diverse device types and monitoring requirements Not recommended for: Extremely small environments with few devices Networks with strict segmentation preventing central data collection Environments with severe network connectivity limitations Recommended Installation Best Practices Collector Placement and Sizing Install collectors close to or within the same network segments as monitored resources Choose servers that function as syslog or DNS servers for optimal placement Select the appropriate collector size based on the expected monitoring load Consider memory and system resources when sizing collectors Avoid monitoring resources across vast internet connections, firewalls, or through NAT gateways Keep in mind Windows collectors can monitor BOTH window and Linux devices while Linux collectors can only monitor Linux devices Recommended Disk Space New installation: ~500 MiB Logs: Up to 800 MiB Temporary files: <1500 MiB Report cache: <500 MiB NetFlow (if enabled): Up to 30 GiB Total recommended: <3.5 GiB (without NetFlow) Network and Security Configuration Ensure outgoing HTTPS (port 443) connectivity to LogicMonitor servers Configure unrestricted monitoring protocol (ex: SNMP, WMI, JDBC) Use proxy servers if direct internet connectivity is restricted Implement NTP synchronization for accurate time reporting Configure firewall rules to allow necessary collector communications Windows Collector Installation Recommended installation methods: Interactive Install Shield Wizard PowerShell silent installation Can be downloaded direct or bootstrap via CDN Service account considerations: For monitoring Windows systems in the same domain: Use domain account with local admin permissions For monitoring systems in different domains: Use local administrator account Ensure "Log on as a service" permissions are granted Linux Collector Installation Prerequisites: Bourne shell sudo package installed (for non-root installations) vim-common package (for xxd binary in newer versions) Recommended installation user: Default logicmonitor user Use executable permissions and install via binary Container Deployment Supported Kubernetes services: Microsoft Azure Kubernetes Service (AKS) Amazon Elastic Kubernetes Service (EKS) Google Kubernetes Service (GKS) Limitations: Full package installation only Linux-based collectors cannot monitor Windows WMI Performance and Optimization Monitor collector performance metrics regularly Tune collector size and configuration based on monitoring load Disable Standalone Script Engine (SSE) if memory is constrained Implement proper log and temporary file management Use container deployments for Kubernetes environments Best Practices Checklist ✅ Select strategically located servers for collector installation ✅ Choose the appropriate collector size based on expected monitoring load ✅ Configure reliable network connectivity and firewall rules ✅ Use non-root users for Linux collector installations ✅ Implement NTP time synchronization ✅ Monitor collector performance metrics ✅ Regularly update collectors to the latest stable versions ✅ Set collector “Down” notification chains for proper collector down alerting Monitoring and Validation Verify collector connection in LogicMonitor portal after installation Monitor collector CPU utilization, disk usage, and performance metrics Periodically review collector logs for potential issues Validate data collection accuracy and completeness Utilize and test collector failover and redundancy configurations Conclusion LogicMonitor Collectors provide a powerful, flexible approach to infrastructure monitoring, enabling organizations to gain comprehensive visibility with minimal operational overhead. By following best practices in placement, configuration, and ongoing management, IT teams can create a robust monitoring strategy that adapts to evolving infrastructure needs. Successful collector deployment requires careful planning, ongoing optimization, and a thorough understanding of your specific infrastructure requirements. Regularly reviewing and adjusting your monitoring approach will ensure continued effectiveness and performance. Additional Resources Collector Capacity Collector Versions Adding Collector Installing Collectors in Silent Mode Installing the Collector in a Container Configuring WinRM for Windows Collector agent.conf Collector Settings Collector Script Caching
skydonnell
8 months ago Place Tech Talk
799Views
2likes
1Comment
High CPU after Collector Upgrade to 33.002
Hi We have seen a couple of collectors CPU increase by between 20 -30% after this upgrade and just interested if anyone else has seen this behaviour ?
Solved
Barb
3 years ago Place Product Discussions
699Views
4likes
10Comments
Monitor DFS Share(windows server) using LM Collector!!
Greetings to all members of the LM community. Hope you all are doing great! Our community blog in this section, discusses on how to monitor DFS share in LM & general recommendations to follow for our LM collector to monitor the share path in today's community blog: Configuring DFS share on Windows server : This DFS share service is dependent on two parameters to establish communication with the target server, shown below, as you can see from the target server: With these two parameters, domain name and IP are used to configure communication with DFS for the purpose of LM data collection. In my test environment, I've created a Stand-alone Namespace that has the following permissions on the local path: In addition to defining the local path permissions for a DFS share, you also have the option to edit the permission for the local path of the shared folder at the time of creating the share path : Pre-requiste/Permissions required : As well as permission, there may be other things the LM collector needs before it can access remote DFS shares : Network Discovery: Enabling network discovery helps the monitoring tool discover and enumerate devices, including network shares, on the network. This can be useful when setting up data collection for resources in remote domains. Firewall and Network Configuration: Ensure that the necessary ports and protocols are open in the firewall between your monitoring tool and the remote domain. Network discovery and access to DFS shares often require specific ports and protocols to be allowed through firewalls. Namespace Path: When specifying the DFS share path in your monitoring tool, use the DFS namespace path (e.g., [ \\(domain/IP).com\dfs] rather than the direct server path. This ensures that the tool can access the share through the DFS namespace. Trust Relationships and Permissions: Ensure that trust relationships between domains are correctly established to allow access. Additionally, configure permissions on the DFS shares and namespace to grant access to the monitoring tool's credentials. It's important to note that the exact steps and configurations may vary depending on your specific network setup, DFS version, and domain structure. Additionally, working with your organization's IT administrators and domain administrators is essential to ensure proper setup and access to DFS resources in remote domains. Monitoring DFS share on LM portal : In the course of testing on the windows server with role-based or feature installation for DFS service, it' is set to discovered or acknowledge the information for DFSR monitoring in LM, when an IP address or domain name(FQDN) is known or defined under shared path as shown below. Edit the necessary configurations for each UNC path you are adding as a monitored instance. These configurations are detailed in the following sections. Under Resource → Add Other Monitoring you can configure DFS path under section “UNC Paths” Updating DFS share path in LM Monitors the accessibility of a UNC path from an collector agent. May be a directory or file path required on LM portal to be defined. Discovery of DFS path in LM Once you finalise the above instructions from the target DFS server, you can monitor a UNC share, whether a domain DFS share or otherwise, using the UNC Monitor DataSource. This DataSource will do a directory listing on the given UNC share and report success or failure. The UNC Monitor DataSource will monitor the accessibility of the UNC path from the collector monitoring this device. Once you have added the DFS share to be monitored, LogicMonitor will begin monitoring the share and will generate alerts if there are any problems. Link for more references: https://www.logicmonitor.com/support/devices/device-datasources-instances/monitoring-web-pages-processes-services-and-unc-paths#:~:text=to%20get%20output.-,UNC%20Paths,-To%20monitor%20a https://learn.microsoft.com/en-us/troubleshoot/windows-server/networking/dfsn-access-failures Keep Learning & Keep Exploring with LM !!!!!! Interested in learning more about features of your LogicMonitor portal? Check our some of our webinars in our community!https://www.logicmonitor.com/live-training-webinars Sign up for self guided training by clicking the "Training" link at the top right of your portal. Check out our Academy resources!https://www.logicmonitor.com/academy/
Persie
2 years ago Place Tech Talk
600Views
15likes
0Comments
LM Linux collector deployment failed to start Logicmonitor watchdog service
Success to set net capabilities on file `/usr/local/logicmonitor/agent/jre/bin/j ava` Detecting proxy, please wait ... Registering collector to bp.logicmonitor.com, please wait ... Init program is systemd ... Redirecting to /bin/systemctl restart logicmonitor-watchdog.service Job for logicmonitor-watchdog.service failed because the control process exited with error code. See "systemctl status logicmonitor-watchdog.service" and "journalctl -xe" for de tails. Congratulations! LogicMonitor Collector has been installed successfully! Extracting bundled JRE files ... Success to set net capabilities on file `/usr/local/logicmonitor/agent/lib/sblin uxproxy` Success to set net capabilities on file `/usr/local/logicmonitor/agent/jre/bin/j ava` Detecting proxy, please wait ... Registering collector to bp.logicmonitor.com, please wait ... Init program is systemd ... Redirecting to /bin/systemctl restart logicmonitor-watchdog.service Job for logicmonitor-watchdog.service failed because the control process exited with error code. See "systemctl status logicmonitor-watchdog.service" and "journalctl -xe" for de tails. Congratulations! LogicMonitor Collector has been installed successfully! [root@WS01UJEU1000009 ~]# systemctl status logicmonitor-watchdog.service ● logicmonitor-watchdog.service - LogicMonitor Watchdog Loaded: loaded (/etc/systemd/user/logicmonitor-watchdog.service; enabled; vendor preset: disabled) Active: failed (Result: exit-code) since Sun 2023-03-05 13:56:20 UTC; 1min 21s ago Process: 344458 ExecStopPost=/usr/local/logicmonitor/agent/bin/logicmonitor-watchdog stop true (code=exited, status=203/EXEC) Process: 344456 ExecStart=/usr/local/logicmonitor/agent/bin/logicmonitor-watchdog start true (code=exited, status=203/EXEC) Mar 05 13:56:20 WS01UJEU1000009 systemd[1]: Starting LogicMonitor Watchdog... Mar 05 13:56:20 WS01UJEU1000009 systemd[1]: logicmonitor-watchdog.service: Control process exited, code=exited status=203 Mar 05 13:56:20 WS01UJEU1000009 systemd[1]: logicmonitor-watchdog.service: Control process exited, code=exited status=203 Mar 05 13:56:20 WS01UJEU1000009 systemd[1]: logicmonitor-watchdog.service: Failed with result 'exit-code'. Mar 05 13:56:20 WS01UJEU1000009 systemd[1]: Failed to start LogicMonitor Watchdog.
Solved
thangaduraibp
3 years ago Place Product Discussions
437Views
12likes
2Comments
Fixing misconfigured Auto-Balanced Collector assignments
I’ve seen this issue pop up a lot in support so I figured this post may help some folks out. I just came across a ticket the other day so it’s fresh on my mind! In order for Auto-Balanced Collector Groups (ABCG) to work properly, i.e. balance and failover, you have to make sure that the Collector Group is set to the ABCG and (and this is the important part) the Preferred Collector is set to “Auto Balance”. If it is set to an actual Collector ID, then it won’t get the benefits of the ABCG. You want this, not that: Ok, so that’s cool but now the real question is how do you fix this? There’s not really a good way to surface in the portal all devices where this is misconfigured. It’s not a system property so a report or AppliesTo query won’t help here… Fortunately, not all hope is lost! You can use the ✨API✨ When you GET a Resource/device, you will get back some JSON and what you want is for the autoBalancedCollectorGroupId field to equal the preferredCollectorGroupId field. If “Preferred Collector” is not “Auto Balance” and set to a ID, then autoBalancedCollectorGroupId will be 0 . Breaking it down step by step: First, get a list of all ABCG IDs https://www.logicmonitor.com/swagger-ui-master/api-v3/dist/#/Collector%20Groups/getCollectorGroupList /setting/collector/groups?filter=autoBalance:true Then, with any given ABCG ID, you can filter a device list for all devices where there’s this mismatch https://www.logicmonitor.com/swagger-ui-master/api-v3/dist/#/Devices/getDeviceList /device/devices?filter=autoBalancedCollectorGroupId:0,preferredCollectorGroupId:11 (where 11 is the ID of a ABCG) And now for each device returned, make a PATCH so that autoBalancedCollectorGroupId is now set to preferredCollectorGroupId https://www.logicmonitor.com/swagger-ui-master/api-v3/dist/#/Devices/patchDevice Here’s a link to the full script, written in Python for you to check out. I’ll also add it below in a comment since this is already getting long. Do you have a better, easier, or more efficient way of doing this? I’d love to hear about it!
mray
2 years ago Place Tech Talk
429Views
12likes
9Comments
Common issues : High CPU usage on the Collector
This article provides information on High CPU usage on the Collector . (1) General Best Practices (a) First and foremost we advise our customers to be on latest General Release Collectors (unless advised not to) . Further information all the Collector information could be retrieved on the link below : https://www.logicmonitor.com/support/settings/collectors/collector-versions/ Also on the release notes of each newer Collector version we will indicate if we have fixed any known issues : https://www.logicmonitor.com/releasenotes/ (b) Please also view our Collector Capacity guide to get a full overview on how to optimise the Collector Performances : https://www.logicmonitor.com/support/settings/collectors/collector-capacity/ (c) When providing information on High CPU usage it would be useful if you can advise if the High CPU usage is all the time or a certain timeframe only (also if any environmental changes were done on physical machine that may have triggered this issue). Please do advise also if this occurred after adding newer devices on the collector or if this issue occurs after applying a certain version of the Collector. (2) Common Issues On this topic i will go through some of the common issues which have been fixed or worked upon by our Development Teams : (A) Check if the CPU is used by the Collector (Java Process) or SBproxy or other processes. (i) To monitor Collector Java Process : Use the datasource Collector JVM status to check the Collector (Java process) CPU usage (as shown below). (ii) To monitor the SBProxy usage : We can use the datasource : WinProcessStats.xml (for Windows collector / For Linux data source (this datasource is still being developed) . (B) If the high CPU usage is caused by the Collector Java processes, below are some of the common causes : (i) Collector java process using high CPU How confirm if this the similar issue : In the Collector Wrapper Logs you are able to view this error message : In our Collector wrapper.log, you can see a lot of logs like the below: DataQueueConsumers$DataQueueConsumer.run:338] Un-expected exception - Must be BUG, fix this, CONTEXT=, EXCEPTION=The third long is not valid version - 0 java.lang.IllegalArgumentException: The third long is not valid version - 0 at com.santaba.agent.reporter2.queue.QueueItem$Header.deserialize(QueueItem.java:66) at com.santaba.agent.reporter2.queue.impl.QueueItemSerializer.head(QueueItemSerializer.java:35) This issue has been in Collector version EA 23.200 (ii) CPU load spikes on Linux Collectors As shown in the image below the CPU usage of Collector Java process has a periodic CPU spike (on an hourly basis) . This issue has been fixed on Collector version EA 23.026 (iii) Excessive CPU usage despite not having any devices running on it In the collector wrapper.log, you can see similar logs as below : [04-11 10:32:20.653 EDT] [MSG] [WARN] [pool-20-thread-1::sse.scheduler:sse.scheduler] [SSEChunkConnector.getStreamData:87] Failed to get SSEStreamData, CONTEXT=current=1491921140649(ms), timeout=10000, timeUnit=MILLISECONDS, EXCEPTION=null java.util.concurrent.TimeoutException at java.util.concurrent.FutureTask.get(FutureTask.java:205) at com.logicmonitor.common.sse.connector.sseconnector.SSEChunkConnector.getStreamData(SSEChunkConnector.java:84) at com.logicmonitor.common.sse.processor.ProcessWrapper.doHandshaking(ProcessWrapper.java:326) at com.logicmonitor.common.sse.processor.ProcessorDb._addProcessWrapper(ProcessorDb.java:177) at com.logicmonitor.common.sse.processor.ProcessorDb.nextReadyProcessor(ProcessorDb.java:110) at com.logicmonitor.common.sse.scheduler.TaskScheduler$ScheduleTask.run(TaskScheduler.java:181) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) This issue has been fixed on EA 24.085 (iv) SSE process stdout and stderr stream not consumed in Windows Please note this issue occurs on only on Windows Collectors and the CPU usage of the Windows operating system has a stair-step shape as shown below. This has been fixed in Collector EA 23.076 (v) Collector goes down intermittently on daily basis In the Collector wrapper.logs, you can see similar log lines : [12-21 13:10:48.661 PST] [MSG] [INFO] [pool-60-thread-1::heartbeat:check:4741] [Heartbeater._printStackTrace:265] Dumping HeartBeatTask stack, CONTEXT=startedAt=1482354646203, stack= Thread-40 BLOCKED java.io.PrintStream.println (PrintStream.java.805) com.santaba.common.logger.Logger2$1.print (Logger2.java.65) com.santaba.common.logger.Logger2._log (Logger2.java.380) com.santaba.common.logger.Logger2._mesg (Logger2.java.284) com.santaba.common.logger.LogMsg.info (LogMsg.java.15) com.santaba.agent.util.Heartbeater$HeartBeatTask._run (Heartbeater.java.333) com.santaba.agent.util.Heartbeater$HeartBeatTask.run (Heartbeater.java.311) java.util.concurrent.Executors$RunnableAdapter.call (Executors.java.511) java.util.concurrent.FutureTask.run (FutureTask.java.266) java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java.1142) java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java.617) java.lang.Thread.run (Thread.java.745) [12-21 13:11:16.597 PST] [MSG] [INFO] [pool-60-thread-1::heartbeat:check:4742] [Heartbeater._printStackTrace:265] Dumping HeartBeatTask stack, CONTEXT=startedAt=1482354647068, stack= Thread-46 RUNNABLE java.io.PrintStream.println (PrintStream.java.805) com.santaba.common.logger.Logger2$1.print (Logger2.java.65) com.santaba.common.logger.Logger2._log (Logger2.java.380) com.santaba.common.logger.Logger2._mesg (Logger2.java.284) com.santaba.common.logger.LogMsg.info (LogMsg.java.15) com.santaba.agent.util.Heartbeater$HeartBeatTask._run (Heartbeater.java.320) com.santaba.agent.util.Heartbeater$HeartBeatTask.run (Heartbeater.java.311) java.util.concurrent.Executors$RunnableAdapter.call (Executors.java.511) java.util.concurrent.FutureTask.run (FutureTask.java.266) gobler terminated ERROR 5296 java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java.1142) java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java.617) java.lang.Thread.run (Thread.java.745) This issue has now been fixed in Collector EA 22.228 (C) High CPU usage caused by SBProxy (i) Collector CPU spikes until 99% The poor performance of WMI or PDH data collection on some cases will cause too many retries will occur and this consumes a lot of CPU. In the collector sbproxy.log, you can search the log string as shown below and you can see the retry times is nearly 100 per request and subsequently this will consume a lot of CPU. ,retry: This is being investigated by our development team at this time and will be fixed in the near future . (3) Steps to take when facing high CPU usage for Collector (i) Ensure the collector has been added as a device and enabled for monitoring : https://www.logicmonitor.com/support/settings/collectors/monitoring-your-collector/ There are set of New Datasources for the Collector (LogicMonitor Collector Monitoring Suite - 24 DataSources) which as shown below and please ensure they have been updated in your portal and applied to your Collectors and also ensure the Linux CPU or Windows CPU datasources have been applied to the Collector : (ii) Record a JFR (java flying record) in debug command window of the Collector : this can done through this method : // unlock commercial feature !jcmd unlockCommercialFeatures // start a jfr , in real troubleshooting case, should increase the duration a reasonable value. !jcmd duration=1m delay=5s filename=test.jfr name=testjfr jfrStart // stop a jfr !jcmd name=testjfr jfrStop // upload the jfr record !uploadlog test.jfr (iii) Upload the Collector Logs : From the Manage dialog you can send your logs to LogicMonitor support. Select the manage gear icon for the desired collector and then select 'Send logs to LogicMonitor': Credits: LogicMonitor Collector development team for providing valuable input in order to publish this article .
Desh_Johl
9 years ago Place Archive
401Views
0likes
0Comments
Using a Dedicated Collector for each Windows Domain Controller?
We ran into trouble monitoring our Windows Domain Controllers because we want to use least privilege and we were only receiving ping and Host Status data. It showed “No data” for CPU, disks, etc. We used the information in the link “https://www.logicmonitor.com/support/monitoring/os-virtualization/monitoring-a-domain-controller-dc” and installed the collector on a DC using the local system account and set it to monitor itself. I am now receiving CPU, disk, etc. from that domain controller. It appears the only catch is that I cannot monitor other systems with that collector but that is OK for our situation. Are there others out there that are monitoring DCs using this method and if so, have you run into any trouble (performance, etc.)? If you are not using this method, how are you monitoring your DCs in Logic Monitor. THANK YOU very much for your assistance/opinions/guidance.
Solved
jfmhfa01
2 years ago Place Product Discussions
354Views
14likes
3Comments