Common issues : High CPU usage on the Collector
This article provides information on High CPU usage on the Collector . (1) General Best Practices (a) First and foremost we advise our customers to be on latest General Release Collectors (unlessadvised not to) . Further information all the Collector information could be retrieved on the link below : https://www.logicmonitor.com/support/settings/collectors/collector-versions/ Also on the release notes of each newer Collector version we will indicate if we have fixed any known issues : https://www.logicmonitor.com/releasenotes/ (b) Please also view our Collector Capacity guide to get a full overview on how to optimise the Collector Performances : https://www.logicmonitor.com/support/settings/collectors/collector-capacity/ (c) When providing information on High CPU usage it would be useful if you can advise if the High CPU usage is all the time or a certain timeframe only (also if any environmental changes were done on physical machine that may have triggered this issue). Please do advise also if this occurred after adding newer devices on the collector or if this issue occurs after applying a certain version of the Collector. (2) Common Issues On this topic i will go through some of the common issues which have been fixed or worked upon by our Development Teams : (A) Check if the CPUis used by the Collector (JavaProcess) or SBproxy or other processes. (i) To monitor Collector Java Process : Use thedatasource Collector JVM status to check the Collector (Java process) CPU usage (as shown below). (ii) To monitor the SBProxy usage : We can use the datasource :WinProcessStats.xml (for Windows collector/ For Linux data source (this datasource is still being developed) . (B) If the high CPU usage is causedby the Collector Java processes, below are some of the common causes : (i)Collector java process using high CPU How confirm if this the similar issue : In the Collector Wrapper Logs you are able to view this error message : In our Collector wrapper.log, you can see a lot of logs like the below: DataQueueConsumers$DataQueueConsumer.run:338]Un-expected exception - Must be BUG, fix this, CONTEXT=, EXCEPTION=The third long is not valid version - 0 java.lang.IllegalArgumentException: The third long is not valid version - 0 at com.santaba.agent.reporter2.queue.QueueItem$Header.deserialize(QueueItem.java:66) at com.santaba.agent.reporter2.queue.impl.QueueItemSerializer.head(QueueItemSerializer.java:35) This issue has been in Collector version EA 23.200 (ii)CPU load spikes on Linux Collectors As shown in the image below the CPU usage of Collector Java process has aperiodicCPU spike (on an hourly basis) . This issue has been fixed on Collector version EA 23.026 (iii)Excessive CPU usagedespitenot having any devices running on it In the collector wrapper.log, you can see similar logs as below : [04-11 10:32:20.653 EDT] [MSG] [WARN] [pool-20-thread-1::sse.scheduler:sse.scheduler] [SSEChunkConnector.getStreamData:87] Failed to get SSEStreamData, CONTEXT=current=1491921140649(ms), timeout=10000, timeUnit=MILLISECONDS, EXCEPTION=null java.util.concurrent.TimeoutException at java.util.concurrent.FutureTask.get(FutureTask.java:205) at com.logicmonitor.common.sse.connector.sseconnector.SSEChunkConnector.getStreamData(SSEChunkConnector.java:84) at com.logicmonitor.common.sse.processor.ProcessWrapper.doHandshaking(ProcessWrapper.java:326) at com.logicmonitor.common.sse.processor.ProcessorDb._addProcessWrapper(ProcessorDb.java:177) at com.logicmonitor.common.sse.processor.ProcessorDb.nextReadyProcessor(ProcessorDb.java:110) at com.logicmonitor.common.sse.scheduler.TaskScheduler$ScheduleTask.run(TaskScheduler.java:181) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) This issue has been fixed on EA 24.085 (iv)SSE process stdout and stderr stream not consumed in Windows Please note this issue occurs on only on Windows Collectors and the CPU usage of the Windows operating system has a stair-step shape as shown below. This has been fixed in Collector EA 23.076 (v)Collector goes down intermittently on daily basis In the Collector wrapper.logs, you can see similar log lines : [12-21 13:10:48.661 PST] [MSG] [INFO] [pool-60-thread-1::heartbeat:check:4741] [Heartbeater._printStackTrace:265] Dumping HeartBeatTask stack, CONTEXT=startedAt=1482354646203, stack= Thread-40 BLOCKED java.io.PrintStream.println (PrintStream.java.805) com.santaba.common.logger.Logger2$1.print (Logger2.java.65) com.santaba.common.logger.Logger2._log (Logger2.java.380) com.santaba.common.logger.Logger2._mesg (Logger2.java.284) com.santaba.common.logger.LogMsg.info(LogMsg.java.15) com.santaba.agent.util.Heartbeater$HeartBeatTask._run (Heartbeater.java.333) com.santaba.agent.util.Heartbeater$HeartBeatTask.run (Heartbeater.java.311) java.util.concurrent.Executors$RunnableAdapter.call (Executors.java.511) java.util.concurrent.FutureTask.run (FutureTask.java.266) java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java.1142) java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java.617) java.lang.Thread.run (Thread.java.745) [12-21 13:11:16.597 PST] [MSG] [INFO] [pool-60-thread-1::heartbeat:check:4742] [Heartbeater._printStackTrace:265] Dumping HeartBeatTask stack, CONTEXT=startedAt=1482354647068, stack= Thread-46 RUNNABLE java.io.PrintStream.println (PrintStream.java.805) com.santaba.common.logger.Logger2$1.print (Logger2.java.65) com.santaba.common.logger.Logger2._log (Logger2.java.380) com.santaba.common.logger.Logger2._mesg (Logger2.java.284) com.santaba.common.logger.LogMsg.info(LogMsg.java.15) com.santaba.agent.util.Heartbeater$HeartBeatTask._run (Heartbeater.java.320) com.santaba.agent.util.Heartbeater$HeartBeatTask.run (Heartbeater.java.311) java.util.concurrent.Executors$RunnableAdapter.call (Executors.java.511) java.util.concurrent.FutureTask.run (FutureTask.java.266) gobler terminated ERROR 5296 java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java.1142) java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java.617) java.lang.Thread.run (Thread.java.745) This issue has now been fixed in Collector EA 22.228 (C) High CPU usage caused by SBProxy (i) CollectorCPU spikes until 99% The poor performance of WMIor PDH data collectionon some cases will cause too many retries will occurand this consumes a lot of CPU. In the collector sbproxy.log, you can search the log string as shown below and you can see the retry times is nearly 100 per request and subsequentlythis will consume a lot of CPU. ,retry: This is being investigated by our development team at this time and will be fixed in the near future . (3) Steps to take when facing high CPU usage for Collector (i) Ensure the collector has been added as a device and enabled for monitoring : https://www.logicmonitor.com/support/settings/collectors/monitoring-your-collector/ There are set of New Datasources for the Collector (LogicMonitor Collector Monitoring Suite- 24 DataSources) which as shown below and please ensure they have been updated in your portal and applied to your Collectorsand also ensure the Linux CPU orWindows CPU datasources have been applied to the Collector : (ii)Record a JFR (java flying record) in debug command window of the Collector : this can done through this method : // unlock commercial feature !jcmd unlockCommercialFeatures // start a jfr , in real troubleshooting case, should increase the duration a reasonable value. !jcmd duration=1m delay=5s filename=test.jfr name=testjfr jfrStart // stop a jfr !jcmd name=testjfr jfrStop // upload the jfr record !uploadlog test.jfr (iii) Upload the Collector Logs : From the Manage dialog you can send your logs to LogicMonitor support. Select the manage gear icon for the desired collector and then select 'Send logs to LogicMonitor': Credits: LogicMonitorCollector development team for providing valuable input in order to publish this article .24Views0likes0CommentsLogicMonitor Collector installation on Windows Server Core
Windows Server Coreand(the free) Hyper-V Server Coreare GUI-less versions of Windows that can be administered remotely with GUI tools. We'verecently seen an uptick in requests for deployment of the collector to these platforms, as Windows introduces a lot of overhead with the addition of the GUI; the other compelling reason to go this route being that Hyper-V Core is a free license of Windows from Microsoft (similar to the free flavor of ESXi, only it can run a Windows collector!) Microsoft Documentation: Managing a Server Core Server Configure Server Core with the SConfig command Option A: Remote Desktop Install Establish a remote desktop session to the Server Core server usingthe instructions provided by Microsoft. Within the standard Command shell, type the word "PowerShell" to load a PowerShell session. Add a new (Windows) LogicMonitor Collector in your portal, and select the PowerShell command instead of the download. Paste (and run) the PowerShell command into the open PowerShell windows within the Remote Desktop Session on the Server Core server. You'll see a message indicating that the download has started, and after some time, the normal InstallShield Wizard will launch as expected. Complete the collector account configuration and proceed as you would with an OS with a GUI. Collect on! Additional methods are certainly possible (Windows Admin Center, Remote PowerShell, more?) and as I have a chance to test/ validate, I will continue to update this post.15Views0likes0CommentsInstalling a collector on a Raspberry Pi 3(b)
Credit to Brandon McEarchern who helped significantly. The main issue with installing the collector on a Raspberry Pi is it’s the ARM architecture. Method 1 Install Java Package complied for ARM architecture. Some functions such as ping & Netflow will not work using this method. SNMP, Scripting, ESX, JDBC, JMX will function normally. Instructions: 1) I used PiDora 25 Beta server build currently available from https://fedoramagazine.org/raspberry-pi-support-fedora-25-beta/ Note Older version of Pidora available from NOOBS is designed for the Raspberry Pi 2 and I had issues installing on a Raspberry Pi 3b 2) Once Pidore is set up you may need to install Perl core. This can be down with the command sudo dnf install perl-core 3) Install the Java environment for Fedora with the command su -c "dnf install java-1.?.0-openjdk" More information on Java Environment for Fedora can be found here https://fedoraproject.org/wiki/Java/FAQ?rd=JavaFAQ 4) Next download the collector. Use the 64 bit Linux version. Once downloaded add execution rights with the command sudo chmod +x LogicMonitorSetup32_1.bin 5) Run the installer with the command ./LogicMonitorSetup32_1.bin The install will unpack the Java Packages to /usr/local/logicmonitor/agent/jre/bin/java and then attempt to run configure.pl which will test for 32/64 java and fail with an error message "Wrong bit version of installer? Please re-download the installer matches your OS bit version!". 6) Rename the LogicMonitor Java Packages. I used the command sudo mv /usr/local/logicmonitor/agent/jre/bin/java /usr/local/logicmonitor/agent/jre/bin/java_original 7) Link the Fedora Java Package to the original location for LogicMonitor using the command sudo ln -s /usr/bin/java /usr/local/logicmonitor/agent/jre/bin/java 8) Re-run the configure.pl scritp with the command /usr/local/logicmonitor/agent/configure.pl The script should install and configure the collector successfully this time. Verify the connection from the LogicMonitor Portal as normal and the Raspberry Pi should show as running after a few minutes. 9) Enable Java ping to override the Proxy in the Collector Configurations. Method 2 Use Exagear 32bit emulation. Exagear is 32bit emulation application for running x86 programs on ARM based computers and is available for around $30 USD. Ping and other features will function using this method as opposed to Method 1, but performance is severally degraded due to the limit resources of a Raspberry Pi. https://eltechs.com/product/exagear-desktop/exagear-desktop-features-and-prices/ This is the equvilent of running the collector in a virtual machine. installing the collector on top of Exagear is fairly straight forward. Instructions to install $ cd ~/Downloads/ Download the archive with ExaGear Desktop packages: $ wget http://downloads.eltechs.com/exagear-desktop-v-1-5/exagear-desktop-rpi3.tar.gz Unpack downloaded archive: $ tar -xvzpf exagear-desktop-rpi3.tar.gz Install and activate ExaGear Desktop by running install-exagear.sh specifying a guest OS image in a directory with deb packages and one license key. The install command for ubuntu is sudo ./install-exagear.sh ubuntu-1404 Install the LogicMonitor 32bit collector https://www.logicmonitor.com/support/getting-started/i-just-signed-up-for-logicmonitor-now-what/3-adding-collectors/9Views0likes0CommentsConfigSource that writes outputs to dashboard
A while back I published some very simpleConfigSources to monitor your collector .conf files: a href="https://communities.logicmonitor.com/topic/1345-collector-configsources/" rel="">https://communities.logicmonitor.com/topic/1345-collector-configsources/ Here's an adaptation that writes the various collected configs to a dashboard, writing each of the config outputs to a text widget. Notes: THIS IS A PROOF OF CONCEPT. No warranty is given or implied (value of your investments may go down as well as up, check with your health professional before taking this medicine, etc). Please test before deploying! As with all data within LogicMonitor (or any system), be aware of access rights of users - in this case to whatever Dashboard(s) the config data will be presented on. Be sure to configure your Roles and Users such that only users who have legitimate need to see this data can access whatever Dashboard(s) you send it to. This uses the REST API v1 to verify the target dashboard exists or create it if it doesn't, and also to create / update the text widgets. It will therefore need an API token for an account with management permission for the relevant Dashboard(s), with ID and Key values set as device propertiesapiaccessid.key and apiaccesskey.key. All of the API interaction is contained with a groovy checkpoint, rather than within the config collection script, so this could very simply be copied into other ConfigSources. The same logic could be used in other LogicModules, such as to write non-numeric outputs of SQL queries or any data collection methods to dashboards. While this provides no history retention as written, it will show current / most recent values. Within the script you can define the desired Dashboard path, e.g.'Collector Configs/Groovy Check' (default as presented here), Dashboard name (hostDisplayName is the default), widget name format (hostDisplayName: wildvalue) and other initial parameterssuch as widget colour scheme, description, etc. This is written for REST API v1. One day I may get around to updating it for v2, for greater efficiency, but today is not that day. Tomorrow is not looking likely either. Dashboard text widgets do have a maximum character limit (65,535 characters). I don't think I've seen a collector config near to or in excess of this, so I have no idea whether a larger config from another devicewould be truncated or whether the widget creation would fail. Other widgets on the dashboard are unaffected by this script creating and updating widgets; likewise later manual changes to widget size, colours, etc should be respected; updates should be to the text content of the widgets only, so the target dashboard could contain other data from the device. For example, it might look a bit like this: Known issues: On the first config collectionfor a multi-instance ConfigSource like this, and where the target dashboard does not already exist, only one widget will be created in the dashboard. This is because all instances collect more or less simultaneously, and each determines the dashboard is not initially present. Each, therefore, attempts to create the dashboard and as soon as the first instance does so, the others will fail as they cannot create a dashboard that (now) already exists. This could be coded around with a simple delay / re-check on failure, but I haven't had time, and the second config collection will create all expected widgets without issue. Additionally, if you create the dashboard first, this issue will not occur.7Views0likes4CommentsCollector REST API Requests
I would like the REST API to support Scheduling a collector version update Applying a one-time collector version update Working with the Collector Custom Properties (recently added I think, but don't see anything in the online documentation about support in REST API).7Views0likes4CommentsCollector configuration monitoring DataSources
These are not ConfigSources. They do not record, store, alert on changes to, let you view historic (etc) config files. If that's what you're after and you have LM Config, take a look here:a href="https://communities.logicmonitor.com/topic/1345-collector-configsources/" rel="">https://communities.logicmonitor.com/topic/1345-collector-configsources/ These, however, are DataSources thatread and parse out certain values from certain Collector .conf files, that let you quickly see a few key settings without having to look at and scroll throughthe config files. Graphs can show you how values have been changed over time. Mostly these are for information only, although there are a couple of useful alert thresholds set. All have AppliesTo of hasCategory("collector") === LogicMonitor_Collector_Settings_General: v1.0.0:PYXRR4 Reads a few key values from agent.conf and wrapper.conf, such as any restart setting, logger size, SB Proxy connector capacity and Java max heap. Will generate a warning alert if the SB Proxy connector capacity may need increasing due to a largeJava max heap size.Additionally monitors the ID of the collector being used to monitor the collector device and alerts if the collector is being monitored by a different collector(we strongly recommend collectors monitor themselves). === LogicMonitor_Collector_Settings_DataCollection: v1.0.0:DM76FY Reads the enable setting (true|false, mapped to 1|0), threadpool and timeout settings for each collection method (snmp, wmi, script, etc) defined in the collector's agent.conf file.Will generate a warning alert for any collection method disabled in the config. === LogicMonitor_Collector_Settings_EventCollection: v1.0.0:N4ENYL Similar to the above, reads the enable setting and threadpool for event collecting tasks. Will generate a warning alert for any that are disabled in the config. ===7Views0likes0CommentsRead only agent / collector
I know I've brought this up before, but I'd like to bring it up again. LM's requirement that collectors run as local admins (or system) is a GAPING security hole in your product. No amount of certificate signing, or other like security measures are a replacement for running a collector or an agent as a read only account. The fact is, with every security measure you take, if the collector is running as an admin account or a system account, its going to be exploitable in one way or another. Having the signed scripts and what not, would be great, but really it shouldn't be the primary focus IMO. Security is much better when its locked down by default and opened up as needed, compared to what you guys are doing, which is a completely open system, that you're trying to add security enhancements on top of. It's almost akin to you guys having no firewall,and then adding a few rules here an there to block certain types of traffic, while the rest of the network is completely exposed. A more prefered architecture (security wise) would be an agent / collector that can run as a read only account and be supported. WMI, perfmon, and many other functions all work fine with a regular user, when it's executed locally. That is why an agent or a special collector is needed. Most ideal communication path would be an "agent" talks to a "collector" which then talks to the portal. This would also allow us to keep our internet locked down. I suspect this would also have the other advantage of taking a lot of load off the collectors and really putting most of the work on the agent, which is ultimately better given that the workload would be distributed. For now though, even having a "supported" configuration for a collector not running as a local admin / system would be a great step in the right direction. The reason this is less of a concern for solution like Solarwinds and SCOM is they're on premises based solutions, meaning there is much lower external risk factor. You guys are cloud, and there for need to design the solution from an untrusted point of view.7Views1like11CommentsCollector could not verify/register if using Palo Alto SSL decryption feature
Just in case this helps other customers... SYMPTOMS: The Windows collector installed ok and the two Collector services were running but the collector could not finish the verification/registration step and showing the 'flame alert' on Settings > Collectors screen. After some troubleshooting, we looked in the wrapper.log file on the collector and saw this error message: [MSG] [CRITICAL] [main::controller:main] [AgentHttpService.checkCertificateOrWait2Valid:1029] The santaba server is not trusted, and "EnforceLogicMonitorSSL" is enabled. Wait 1 minute to retry. Please check the network settings, or disable "EnforceLogicMonitorSSL" in agent.conf and restart collector The customer set up a whitelist on their Palo Alto firewall for *.logicmonitor.com and it started working (or list of ~15 IP address ranges). Alternatively you can lowersecurity and changethe agent.conf (config file) fromEnforceLogicMonitorSSL=true to false.2Views0likes1CommentAlert suppression when collectors are unreachable
Hi I am having a frustrating experience when I have tried to make LM redundant using multiple collectors, but then I am at the mercy of the customers internet connection so when the collectors go down, I then get a couple of thousand host status events rather than a number of critical collector alerts making it very hard to see the wood for the trees in the UI alerts page as I cannot just filter out host status alerts as I could have more from other customers. Am I missing something? Doothers have a decent scalable workaround? Has this been raised as a feature request before? and if so is there a reason its not materialised? Thanks in advance David2Views0likes2CommentsCollector Sizing
New to this forum. We are trialing LogicMonitor as a replacement for a traditional NMS system. We're a resell of NMS service. I believe the industry calls that an MSP. We want to resell this service using an appliance as collector. Basically we want to preload the collector software and send it to a customer location and then go from there. What I'm struggling with is some guide to what hardware we should spec out for this. I realize a one size fits all is not going to get it. Our customers service is all WAN only routers and some customers have netflow. We have some customers that have 10 or 15 devices and some that have all the way up to 1000 to 1200 devices with every thing in between. The protocols used are SNMP, ICMP, Syslog, Netflow. So I'm thinking one type of hardware for 0 to 200 devices, another for 200 to 500 devices, and a third for 500 to 1200 devices. I figure anything above that is going to take more than one collector. I have found a industrial vendor that can supply boxes with every thing from Celeron to Xeon class CPUs and everything in between with 2g to 16g or more and SDD or HDD drives. The first question is does any one have any experience running collectors with Quad Core Celeron processors. Second question is what size hard drives show we have. There's not really in formal documents that the company publishes on Collector hardware other than an obscure document list Xeon class servers and even that has no disc sizing recommendations. Any help and advice would greatly be appreciated2Views0likes2Comments