Recent Discussions
Checkpoint Power Supplies - 6000-XL
Not sure where else to post this or how to get my update into the repo. To support the 6000-XL and some other variants On the Datasource: Checkpoint Power Supplies Modify the datapoint 'PowerSupplyStatus' to: Up|PresentMonitoringLife3 days agoNeophyte11Views0likes1CommentDell ECS Flux API DataSource
I am currently working on replacing the DataSource elements for CPU and Memory for the Dell ECS S3 Platform we use, but have hit a problem in building the JSON I can get this to work in Python, but I am really struggling in Groovy where I get 400 Bad request where I see a Syntax as the problem. Here is the JSON I am sending as I would write it in python and working: sendData = {"query":"from(bucket:\"monitoring_op\") |> range(start: -5m) |> filter(fn: (r) => r._measurement == \"cpu\" and r.cpu == \"cpu-total\" and r._field == \"usage_idle\" and r.host == \""+hostname+"\")"} Here is it in Groovy format which has been through a few iterations. dataString = ["query":'from(bucket:\"monitoring_op\") |> range(start: -5m) |> filter(fn: (r) => r._measurement == \"cpu\" and r.cpu == \"cpu-total\" and r._field == \"usage_idle\" and r.host == \"'+hostname+'\")'] The actual code is here it is a copy of one of the original ECS Data Sources then adapting the getApi element to use the flux API as opposed to the dashboard API elements. import groovy.json.JsonSlurper import groovy.json.JsonOutput import groovy.json.JsonBuilder import java.util.concurrent.Executors import java.util.concurrent.TimeUnit import com.santaba.agent.groovyapi.http.*; import com.santaba.agent.util.Settings hostname = [Removed] user = [Removed] pass = [Removed] collectorplatform = "linux" // Temp lines println "Successfully read params" def nodeid = [Removed] // Temp lines end if (nodeid == null) { println nodeid println "auto.emc.ecs.node.nodeid is not set! Please see Technical Notes." return 1 } debug = true def success = false def token = login() //Temp Lines println "out of login" println token // End Templines if (token) { def response = getApi(token) println "Response below" println response } else if (debug) { println "Bad API response: ${response}"} return success ? 0 : 1 def login() { println "in login" File tokenCacheFile if (collectorplatform == 'windows') tokenCacheFile = new File("emc_ecs_tokens" + '\\' + hostname + "-emc_ecs_tokens.txt") if (collectorplatform == 'linux') tokenCacheFile = new File("emc_ecs_tokens" + '/' + hostname + "-emc_ecs_tokens.txt") if (debug) println "Token cache filename is: ${tokenCacheFile}" // If we have a non-empty readable token cache file, then extract and return the token if (tokenCacheFile.exists() && tokenCacheFile.canRead() && tokenCacheFile.readLines().size() == 1 && !tokenCacheFile.readLines()[0].contains("null")) { if (debug) println "Token cache file exists and is non-empty" def cachedToken = tokenCacheFile.readLines()[0] if (cachedToken) { if (debug) println "Extracted token from cache file: ${cachedToken}" return cachedToken } } else if (!tokenCacheFile.exists()) { // token cache file does not exist, create it if (debug) println "Token cache file does not exist, creating..." new File("emc_ecs_tokens").mkdir() tokenCacheFile.createNewFile() } else if (tokenCacheFile.text != '') { // malformed token cache file println "Bad token file: ${tokenCacheFile.readLines()}\nClearing..." tokenCacheFile.text = '' } else if (debug && tokenCacheFile.text == '') { // token cache file has been cleared, proceed and rebuild println "Session token file is cleared. Rebuilding..." } // Fetch new token using Basic authentication, set in cache file and return if (debug) println "Checking provided ${user} creds at /login.json..." def userCredentials = "${user}:${pass}" def basicAuthStringEnc = new String(Base64.getEncoder().encode(userCredentials.getBytes())) def loginUrl = "https://${hostname}:4443/login.json".toURL() def loginConnection = loginUrl.openConnection() loginConnection.setRequestProperty("Authorization", "Basic " + basicAuthStringEnc) def loginResponseBody = loginConnection.getInputStream()?.text def loginResponseCode = loginConnection.getResponseCode() def loginResponseToken = loginConnection.getHeaderField("X-SDS-AUTH-TOKEN") println loginResponseCode if (loginResponseCode == 200 && loginResponseToken) { if (debug) println "Retrieved token: ${loginResponseToken}" tokenCacheFile << loginResponseToken if (debug) println "Set token in cache file" return loginResponseToken } else { println "STATUS CODE:\n${loginResponseCode}\n\nRESPONSE:\n${loginResponseBody}" println "Unable to fetch token with ${user} creds at /login.json" } println "Something unknown went wrong when logging in" } def getApi(token, alreadyFailed=false) { def dataUrl = "https://"+hostname+":4443/flux/api/external/v2/query"; if (debug) println "Trying to fetch data from ${dataUrl}..." //def GetStatus = JsonOutput.toJson([query:'from(bucket:\"monitoring_op\") |> range(start: -5m) |> filter(fn: (r) => r._measurement == \"cpu\" and r.cpu == \"cpu-total\" and r._field == \"usage_idle\" and r.host == \"${hostname}\")']) dataString = ["query":'from(bucket:\"monitoring_op\") |> range(start: -5m) |> filter(fn: (r) => r._measurement == \"cpu\" and r.cpu == \"cpu-total\" and r._field == \"usage_idle\" and r.host == \"'+hostname+'\")'] def GetStatus = JsonOutput.toJson(dataString) println GetStatus println "GetStatus Class" println GetStatus.getClass() def dataHeader = ['X-SDS-AUTH-TOKEN':token,'Content-Type':'application/json','accept':'application/json'] // Now we can retrieve the data. httpClient = Client.open (hostname,443) def dataData = httpClient.post(dataUrl,GetStatus,dataHeader); if ( !(httpClient.getStatusCode() =~ /200/)) { println "Failed to retrieve data "+httpClient.getStatusCode return(1) } String dataContent = httpClient.getResponseBody() println httpClient.getStatusCode println dataContent return new JsonSlurper().parseText(dataContent) } Thanks in advance for any help.SteveBamford4 days agoNeophyte11Views0likes1CommentBest Practices for API Calls in a datasource
Hi all, Possibly the most random question of the week, when working in datasources where you are looking to utilize API calls what would you say is the maximum calls to make in what datasource typically I have worked on one data retrieval call per data source. Why the question? So Dell have withdrawn a number of fields from their Dashboard API in Dell ECS which means metrics such as CPU and memory need now to be retrieved from the flux API that is provided amongst a few other metrics which I may or may not need to provide to our infrastructure team. To do this it looks like I may need to generate at least two flux queries one for CPU and one for memory this will result in two API calls. So would you create a single data source for each metric or make the two calls within the datasource so you have a global stats data source for this sort of information. Thanks in advance for your input.SteveBamford5 days agoNeophyte18Views0likes1CommentSeeking feedback on Nutanix monitoring
We are starting to monitor Nutanix environments in our datacenter, and I've downloaded all the LM modules, so they are ready to use. I'm looking for any success stories and feedback from users, because as of now I can get SNMP for system stats, but nothing from the Nutanix modules themselves. Within Prism we added an SNMP user and the v.3 creds are in the LM resource. It appears the SNMP service needs to be restarted after configuring a user. This is a reference we've used so far: https://portal.nutanix.com/page/documents/kbs/details?targetId=kA0600000008bAECAY#Heading_BJaredM13 days agoNeophyte43Views0likes2CommentsHow to handle unnecessary active alerts
Dear LM community, I’m looking for the best practice to handle unnecessary active alerts in LogicMonitor. As far as I understand, we can acknowledge, put into SDT, escalate, or adjust alert thresholds (Instance thresholds), or even group instances with custom alerting rules. However, it doesn’t seem possible to simply remove an active alert once it’s triggered- please correct me if I am mistaken. Each of these approaches has some downsides — for example, grouping interfaces to suppress alerts may cause us to miss new alerts later if the port becomes active again. What is the recommended way to deal with such unnecessary alerts - in this case - inactive network interfaces that are alerting but are expected to stay down? Thank you in advance for your input!Clark_Kent17 days agoNeophyte57Views0likes3CommentsMeraki Switch Stack vs Cisco Switch Stack
I apologize if this topic has already been addressed—I was unable to locate any relevant discussions. I'm encountering a challenge with how LogicMonitor Topology represents Meraki stacked switches, particularly in contrast to its handling of Cisco stacked switches. When LogicMonitor discovers Cisco switches configured in a stack, it identifies the stack as a single logical entity, aggregating multiple serial numbers and hardware components. This behavior aligns with Cisco IOS, which presents the stack as a unified system. As a result, LogicMonitor’s topology mapping treats the stack as a single node, simplifying both visualization and monitoring. Meraki, however, takes a different approach. The Meraki cloud platform recognizes individual switches as members of a stack, and because of this (I believe) LogicMonitor treats each switch as a distinct device. Consequently, topology maps generated by LogicMonitor show individual connections between each switch in a stack, rather than representing the stack as a cohesive unit. This leads to fragmented and often impractical topology views. Manual topology mapping is not a viable option in my environment. Has anyone found a method or workaround to reconcile this issue?billbianco18 days agoNeophyte36Views1like1CommentExample scripts
Hi community, I'm running into a limitation with reporting on Scheduled Downtime (SDT) in LogicMonitor. Right now, i' m able to pull alerts that occurred during SDT' s but i cannot generate a single report that shows all historcal SDTs across all my resources/devices. is there any way to generate such a historical SDT report, does someone have a script or code to share to get that trough the API Thanks in advance!Admine2 months agoNeophyte62Views1like3CommentsAlert Tsunami: Why the Huge Delay and Flood of Post-Resolution Power Alerts?
Subject: Alert Tsunami: Why the Huge Delay and Flood of Post-Resolution Power Alerts? Hello LM Exchange community and LogicMonitor team, We recently experienced an issue that's causing significant frustration and making our alerting system less reliable. We had a couple of anticipated power cable pull-outs (testing/maintenance), which were quickly resolved. However, we then received a massive backlog of LogicMonitor alerts for this event hours after the issue was fixed and the system logs were clear. The Problem Massive Alert Delay: The initial power loss events occurred and were resolved around 7:00 PM and 8:00 PM (based on the Lifecycle Log). However, we started getting a huge flood of critical alerts via email at 9:13 PM, 9:43 PM, 10:13 PM, and 10:43 PM—hours after the issue had been mitigated and redundancy was restored. Excessive Alert Volume: We received dozens of separate critical alerts (e.g., LME205086576, LME205086578, etc.) for a single, contained event, all arriving en masse hours later. Past "Fix" is a Concern: The last time this occurred, the only way I could stop the flood of delayed emails was to turn off alerting for the device and then turn it back on. This is not a scalable or sustainable solution for a reliable monitoring platform. Key Questions for the LogicMonitor Team What is causing this significant delay in alert processing and delivery? It appears the system is holding a large backlog of alerts and then releasing them all at once hours later. What is the recommended, official way to clear an alert backlog without having to resort to manually disabling and re-enabling alerting? Is there a known configuration or polling issue that would cause a single event (like a brief power loss) to generate dozens of unique critical alerts over a short period, and how can we consolidate these into a single, actionable notification? Data for Review LogicMonitor Email Log (Image 1): Shows critical alerts arriving long after the issue was resolved (9:13 PM to 10:43 PM). Device Lifecycle Log (Image 2): Shows the power events (PSU0003, RDU0012) occurring and being resolved between 8:01 PM and 9:22 PM. Any insight or official guidance on how to prevent this "alert tsunami" would be greatly appreciated. We rely on timely and accurate alerting, and this behavior significantly undermines that trust.B1llw2 months agoNeophyte50Views1like4CommentsUbiquiti Unifi 'Source Errors
We're having some difficulties getting the Unifi 'Sources to properly complete their Active Discovery Scripts, leading to building thread counts in the collector, leading to collector service restarts... (ScriptADTasks - 7 days - Red = Failures): I've been chasing the issues (some is the 'Source's appliesTo not properly targeting devices based on the gathered SNMP initial discovery properties) and have not quite found the smoking gun as the error I'm being given doesn't directly point to the issue. Running AD Test from the DS "Ubiquiti_UniFi_Security_Gateways" against a UXG Pro device gives me this error: "Text must not be null or empty" It doesn't identify which text is needed. The line numbers mentioned don't seem to relate directly to the DS AD code's line numbers.Cole_McDonald4 months agoProfessor57Views0likes6Comments