ContributionsMost RecentMost LikesSolutionsAll my collectors are going down every 24 hours. Hi, Starting last week Thursday, all my collectors, across two different LM portals, are going down approximately every 24 hours. Apparently the Watchdog is telling the Agent to reset itself which causes everything to die for 5-10 minutes. It's happening on almost every collector we have. Some do it every day, some skip a day here and there. They ALL started this last week Thursday. We didn't make any global changes and have no idea what the heck happened. Is anyone else dealing with this? There are entries like this that show the Watchdog service told the main Agent server to restart itself. [2025-05-06 20:05:47.876 GMT] [MSG] [CRITICAL] [statusmonitor:::] [StatusListener$1.run:135] Peer request to shutdown, CONTEXT=CAUSE=shutdown cmd, ACTION=quit [2025-05-06 20:05:47.876 GMT] [MSG] [CRITICAL] [statusmonitor:::] [StatusListener$1.run:151] Shutting self down by quit with 0, CONTEXT=MSG=all sockets closed, System.exit(0) now [2025-05-06 20:05:47.882 GMT] [MSG] [INFO] [statusmonitor:::] [RestartUtil._reportEvent:236] Reported restart reason successfully, CONTEXT=type=ReceivedShutdown, reason=Collector receives shutdown command from watchdog. Agent will restart. [2025-05-06 20:05:47.883 GMT] [MSG] [INFO] [statusmonitor:::] [RestartUtil._saveRestartReason:292] Save restart reason successfully, CONTEXT=file=C:\Program Files (x86)\LogicMonitor\Agent\conf\restart.conf [ These are the tickets that they generated showing that it delays a few minutes each day, but is happening almost by clockwork. I opened a ticket via Chat but was told that something is overload the agent and we need to up the collectors sizes. This doesn't really tell what happened last Thursday that started causing the problem so I'm posting here wondering if anyone is having the same issue. Thanks. Re: Notice for Upcoming Deprecation of Websites, Reports, and Settings pages from legacy UI I sent a 10+ page document outlining all the issues I have with it over to my CSM person. She responded that at least a few or them are being worked on but so many things just seem to be changing for no good reason. Any way to find all my devices or groups that have Custom Thresholds? Hi, We want to audit our system and find all the groups, devices, etc that have custom thresholds to make sure they're all correct. I tried to use the Custom Threshold report, but everything I try to do just gives me an error that says I can't check more than 50,000 things at one time. I tried to point it at a folder that only has 300 devices in it, and even that wouldn't work. This report is not a viable option if I have to run it 100+ times because I have to point it at groups that only have a few machines in it. So, I'm hoping there's a way to do this via the Powershell Module or API or something. Does anyone know how this can be done? I thought maybe I could say give me all the groups where the CustomThreshold wasn't blank, but I can't find where the custom thresholds are stored. If anyone knows how this can be done, let me know. Thanks! Re: Any way to grab Threshold data as a variable and add it to a graph? The one in my screenshot is the CPU Datasource for Linux devices. But it would apply to all datasources. Disk space, Memory usage, CPU usage, etc. Since there's no option for "Show thresholds on the graph" I was trying to figure out a way to do it on my own. Any way to grab Threshold data as a variable and add it to a graph? Hi, I've always thought it would be cool to be able to have a line on some of the graphs to show what the thresholds are. It makes it easier to see when a graph goes over or under a particular line. I was able to do it today by simply creating a Virtual Datapoint and entering the number I wanted. However, if we ever change a threshold, this will not update along with it. For example, here is our 1/5/15MinLoadPerCore graph where I added a line for "1" to easily see when we hit the threshold. And here's the Virtual Datapoint hard coded as 1. What I'd love to do is create a Virtual Datapoint and somehow use variables that would pull the threshold values so any time something changes, the graph would update automatically. I'm going to go out on a limb and say it's not possible, but in case it is, I thought I'd ask here. Surprise me. ;) What do you alert on for CPU on Linux? Hi, For Windows, we use the standard CPU Percent to alert when a server is running hot. We do the same for Linux, but we also have those MinLoadPerCore alerts that we get all the time on various machines. When we get the MinLoad alerts, we look at CPU and usually find that it's not running super high so we ignore them. On some Noisy machine we just keep upping the threshold from 1 to 1.2 to 1.5 etc until we stop getting alerts. That seems kind of pointless and I'm leaning towards just turning off alerting on the MinLoad datapoints completely. So my main question is, what do you all alert on? Do you find the MinLoadPerCore alerts to be valuable? When you get one, do you take steps to up the CPU count on those machines even if the CPU usage isn't super high? Just looking to see what everyone else does. Thanks ESX vs vSphere data discrepancy Hi, We use VMWare and vSphere and have LM configured to poll the vSphere clusters to get the data such as VM Performance. We then looked at the various ESX modules vs vSphere modules to see what was different. When we looked at the ESXi Modules, there is a disclaimer on them that says: By default this module is disabled due to monitoring the same metrics as VMware_vSphere_VirtualMachinePerformance. We turned on the VMware_ESXi_VirtualMachinePerformance module and compared that to the data in the VMware_vSphere_VirtualMachinePerformance module and assumed we'd see similar information. However, the information is actually very different. Does anyone know how these work and why the information that says it's the same would actually not be? Re: Any way to find all dashboards that have broken widgets? LM's official response is that they have no way of doing that and I should just manually go to every dashboard and find the ones that are broken. Or do whatever this means... after deep checking your request, I found we have no native way to achieve what you’re looking for. I believe that the best approach would require a complex configuration and crossing multiple information across your portal. For example though the groovy script in a module get the configuration of the dashboards or alert rules, then check also the configuration of the devices, and cross that information to identify if something has changed and where, so you can take care of it. In this case, my best recommendation would be reaching out to the Professional Services team through your CSM, and they can gladly assist you with the creation of this customized feature Any way to find all dashboards that have broken widgets? Hi, Because LM bases so many things on Text searches instead of group IDs, any time something get renamed, moved, reorganized, etc, everything breaks. Is there any easy way to find any dashboards that end up broken because they're referencing something that no longer exists? Same goes for Alert Rules that might be looking for a group that doesn't exist any more and no one knows there's a problem. Thanks. Re: Any way to add the graphs into a ticket using the Integration to a ticketing system? You mean the Black Hole? ;) If only there was a website where people could leave feedback for a product, and other people could see the feedback, and even vote and leave their own feedback. Hmm...
Top ContributionsCan I monitor vCenter tags and create an alert if a computer doesn't have one?SolvedIs there a way to configure an Alert Rule to only match websites OR resources?SolvedCan LM record process information like what you see iN Task Manager?SolvedRunning a Perl script on an AIX box over SSH?SolvedPowershell: Expanding a variable inside single quotes to make an API callSolvedCan I monitor a JSON file? Example included.SolvedUsing Postman to create multiple Websites via API & CSV?SolvedIs it possible to use Regex in a Group or AppliesTo filter?SolvedCan LM run a SQL Query against a specific Server/Database and alert based on the results?Any easy way to delete a Step from a website check?