Collecting a very large number of datapoints
I have a need to collect data about CPU P-levels in VMware hosts. The way that VMware is structured in LogicMonitor relies on a vCenter server, and then all of its hosts are created in Active Discovery. There does not seem to be a way to create a datasource that uses those AD-found hosts as target resources. So I have a script that hits my vCenter, loops through each of a few dozen hosts, each of which has around 80 CPUs, each of which has around 16 P-levels. When you multiply that all up, that's about 30,000 instances. The script runs in the debug environment in about 20 seconds and completes without error, but the output is truncated and ends with "(data truncated)". When I try to "Test Active Discovery", it just spins and spins, never completing. I've waited up to about 20 minutes, I think. It seems likely that this is too much data for LogicMonitor to deal with all at once. However, I don't seem to have the option to be more precise in my target for the script. It would make more logical sense to collapse some of these instances down to multiple datapoints in fewer instances, but there isn't a set number of P-levels per CPU, and there isn't a set number of CPUs per host, so I don't see any way to do that. There doesn't seem to be any facility to collect this data in batches. What can I do?75Views4likes5CommentsNetwork Interface - Duplicates/Nulls
Is there any solution for when network interfaces get assigned a brand new ID or become null and have a duplicate ID? Example: This seems to have every so often across our clients where Ge1/0/2 will go from ID 10 to ID 150 randomly maybe after a firmware update or something. You would think LM would be smart enough to merge these togethers or not create a duplicate null ID 10 when it exists already. I guess I could filter out null instance names in the discovery but that doesn't resolve the ID change on interfaces.118Views6likes4CommentsExcluding VMware VMs from instance discovery
When we add a vCenter into Logic Monitor, the VMs in it’s managed clusters are discovered as instances of underneath datasources applied to the vCenter, like: VMware VM Status VMware VM Snapshots VMware VM Performance Sometimes there are VMs that we have no interest in monitoring, so we don’t want them to be picked up by these datasources. At the moment, we’re manually adding an Instance Group, putting those VMs in the group and then disabling alerts, which is quite a manual process. Ideally we’d like LM to not discover VMs that have had a specific tag/value applied to them in vCenter. I think we should be able to do this by modifying the Groovy script used for Active Discovery on these data sources, but I’m not sure how to go about that. Has anyone managed to do something similar? DaveSolved422Views29likes16CommentsAnybody else disabling Meraki instances by default?
If you are, I’d like to know if you’re experiencing the problem we are that LM has tried and failed to troubleshoot: For some random reason, a number of meraki instances disappear from LM during one discovery cycle and reappear during the next discovery cycle. This isn’t normally a big problem since they instances aren’t set to delete until 30 days. Normally, they’d just show back up and have a gap in data. However, in our casewe have a business need to have the instances disabled on discovery (we charge the customer per Meraki device we actively monitor). This means that instances that have been discovered and enabled for monitoring randomly disappear and reappear as disabled instances. Also, any customer instance level properties that were added to the instance also are not present on the rediscovered instance. In the last 3 hours, there have been 3,577 instances (out of somewhere around 18,000) that disappeared and reappeared in this way. The problem was so pervasive that I had to write a script to loop through all instances and enable them based on a master billing sheet.Solved344Views14likes11CommentsLinuxNewProcesses DataSource -- Auto discovery and key off of HOST-RESOURCES-MIB::hrSWRunName
Hello all! I just wanted to share my edits. I never could get LinuxNewProcesses to work for my needs.. but we really wanted it to also have auto discovery and automatically add a list of toolsets that we have deployed across the board. I did this LONG ago and my wildvalue was the PID…but that’s dangerous and I ended up creating thousands of entries in the LM database because my processes (thousands of them) were always changing. . . .this takes a different approach and keys off of the process name. #1 You just need to have a property defined with a comma separated list These names need to be from “HOST-RESOURCES-MIB::hrSWRunName” #2 My polling is every minute but don’t alert unless it’s been down for an hour…for my scenario, I do this on purpose because some of my applications run for about 5 minutes and then aren’t kicked off again for another 10…so adjust as needed :) The status is under a security review right now.. I’ll post the lmLocator if it makes it! Otherwise here’s the autodiscovery.. the collection script wont’ work and you’ll have to modify it import com.santaba.agent.groovyapi.snmp.Snmp; def OID_NAME = ".1.3.6.1.2.1.25.4.2.1.2"; def host = hostProps.get("system.hostname"); def services = hostProps.get("linux.services").split(','); Map<String, String> result = Snmp.walkAsMap(host, OID_NAME, null) result.forEach({ index,value->index = index; value = value; for (service in services) { if (value ==~ /${service}/) { def CMD_OID = ".1.3.6.1.2.1.25.4.2.1.4." + index; def service_cmd = Snmp.get(host, CMD_OID); def desc = index + " | " + service_cmd; out.println value + "##" + value + "##" + desc } } }) Script: Line 89: if ("${name}" == "${processPath}") {131Views19likes3Commentsheads up - property corruption due to unknown results
I wanted to share this with everyone since it bit me recently. Sometimes LM will reset the value of properties like system.ips to just the single IP associated with the resource if something disrupts access to SNMP even briefly. The problem with this is it can impact other features, like Netflow binding. I am still battling this out with support but in the meantime I wrote a script to trigger AD for specified devices (had to use an undocumented endpoint) and I schedule that hourly to limit the damage in case it happens (normally AD is triggered once per day unless specific changes occur). My change logs show system.ips resetting fairly often, so the script is definitely helping. I explained to support that thwacking data due to an unknown result is a bug and no property should be changed due to that situation, but I imagine this bug will be hard to unwind. I also recommended a partial fix where AD could be triggered by a “host up” event to limit the damage, but that is a hack. Avoiding data corruption in the first place is the right fix. FWIW, the main loop of my script is below (it is in an ancient language, but the logic should be clear :)). DEVICE: for my $d (@{$devices}) { if ($DEVICES{lc $d->{displayName}}) { if ($ACTION eq "scheduleAutoDiscovery") { verbose 1, "scheduling AD for $d->{displayName}\n"; if (not $lmapi->post(path => "/device/devices/$d->{id}/scheduleAutoDiscovery")) { warn "ERROR: $ACTION failed\n"; } } else { die "ERROR: unsupported action: $ACTION\n"; } } }38Views5likes0Comments