LogicMonitor Portal Metrics


Userlevel 3
Badge +4

LogicMonitor Portal Metrics is a DataSource that queries the API of a specified LogicMonitor portal for overall statistics such as device, collector, and alert counts. It was originally written by fellow Sales Engineer @Jake Cohen, and updated by Monitoring Engineer @Julio Martinez (credit where credit is due!) It can be useful for tracking the activity within an account over time.

83f54752f45e6945af13ebe95297dbd6.png

The recommended/ required method for implementing the DataSource is as follows:

  1. Download the LogicMonitor Portal Metrics DataSource from the LogicMonitor Repository using locator code J7RGZY.
  2. Add a new device to your account in Expert Mode - use 'logicmonitor.account' in place of IP Address/ DNS and whatever you'd like for the Display Name (LogicMonitor Portal, for example.)
  3. - This device won't respond to standard DataSources, so you'll probably want to do some alert tuning once it's been added.
  4. Add properties to the device to allow the DataSource to authenticate. The required properties are:
  • lmaccount (LogicMonitor account name - without the logicmonitor.com at the end)
  • lmaccess.id (LogicMonitor API Key Access ID)
  • lmaccess.key (LogicMonitor API Key Access Key)
  1. Once those properties are in place, the DataSource should automatically apply to the new device.
  2. Download the LogicMonitor Portal Metrics dashboard from Github.
  3. Let us know what you think!

17 replies

@Kerry DeVilbiss this is awesome. Question for y'all on this, the total alert count looks to be maxing at 800. I haven't done any troubleshooting yet to see if that's just what it's reporting, but curious if that's a limitation somewhere in the script.

Found it :D it's in the call to LM, 

    int itemsPerPage = 800

 

Userlevel 6
Badge +8

Cool. We do a similar thing for alert volumes in each of our regions, the types of alerts (based on data points) and the alert acknowledgements by operator.

Userlevel 3

We recently received and deployed a datasource we got from @Jake Cohen that also displays the number of Cloud devices monitored by a Collector, which is important for my leadership to understand our account utilization/commit metrics. I would highly encourage "LogicMonitor Portal Metrics DataSource from the LogicMonitor Repository using locator code J7RGZY" also incorporate this datapoint. 

Userlevel 5
Badge +10

This datasource is exactly what we are looking for, but there is one problem.

It isn't returning the proper counts for alerts.

I tried adjusting the two variables

    int maxPages = 5
    int itemsPerPage = 800

But the most I could get is 1000. Our current count is 9079 Warnings, 415 Errors and 284 Criticals, but everything gets moved to all equal 1000. Even when I adjust the two variables to a much higher rate.

Userlevel 5
Badge +10

I can't seem to edit my post.

It appears the paging parameters aren't working, most likely the offset.

Userlevel 5
Badge +8

@Joe Williams the alerts count paging works differently to other calls. I don't know why, but it does:

https://www.logicmonitor.com/support/rest-api-developers-guide/v1/alerts/get-alerts/

Quote

Note: The response 'total' will be a negative number if there are additional alerts that satisfy the request criteria that weren't included in the request, and that "at least" that number of alerts exist. For example, if you request the first 500 alerts and you have 3000 alerts in your account, the response may include total=-1000 (i.e. you have at least 1000 alerts, but you didn't ask for them all).

 

Therefore, your recursion to fetch additional alerts should run if the 'total' is a negative number.

Something a bit like this (NOT complete code):

// Enclosure to GET alerts for a Group
def GETGroupAlerts(groupWildvalue,filterString='',offsetPassed=0)
{
	/*
	... define url including size, fields, filters etc and make API call for alerts.
	Initial offset will be zero as per default passed parameter.
	Hardcode size to be 1000 as this is the maximum number of results the API will return from one call.
	*/

	/*
	... actually use the above to make the API call...
	*/

	// Parse the API response and put the results into a map, something like:
	if (code == 200)
	{
		// 200 response code (OK), meaning credentials are good. Slurp...
		def allResponse = new JsonSlurper().parseText(responseBody);
		def alertCount = allResponse.total;

		// LOOP THROUGH RESULTS:
		allResponse.items.each
		{ alert ->
			alertsMap << [
							(alert.id) : [
											severity 	: alert.severity,
											sdted		: alert.sdted,
											acked		: alert.acked,
										],
						];

		}

		if(alertCount < 0)
		{
			/*
			// DEBUG
			println 'we ought to go get some more...';
			println 'alertCount: ' + alertCount;
			println 'size: ' + size;
			println 'offset: ' + offset;
			println 'size + offset: ' + (size + offset);
			// END DEBUG
			/**/
			alertsMap << GETGroupAlerts(groupWildvalue,filterString,(size + offset));
		}
	}
	return alertsMap;
}
//----------------------------------------------------------------------------------------

Whenever you finally get a response with a positive 'total' number, you're at the end of the alerts list, the recursion will stop, and you'll have one alertsMap object with all the alerts in it, which you can then do whatever you like with.

Note the above bits of code are from a script that uses the API v2 data structure. Note also, the hacked out chunks above are nothing like a complete script.

Note graph values match Alerts tab values:

 

Userlevel 5
Badge +10

@AnthonyH That was it.

            if (response.data.total > 0)
            {
                break
            }

With that at the bottom instead of the original and I took out another portion it now works flawlessly.

Userlevel 3
Badge +7

Thank you, Kerry, for making this available!

I made a minor change to fix pulling of alert metrics, available at GJNN46.

I also included an optional, commented-out variation of the function that processes alert metrics to account for SDT, resulting in numbers similar to those shown on the Alerts tab in LogicMonitor's sidebar. Using that would also make available a new metric - 'SDTedCount' - with the number of active alerts in SDT. (You'd need to add that metric as a datapoint if you try that alternate function.)

Userlevel 3
Badge +8

Release notes for .142 (I think) indicated that datasource LM_Device_Count had been deprecated and the replacement is Logicmonitor_Portal_Metrics.

I don't recall how/if LM_Device_Count was related to this dashboard, but I upgraded Logicmonitor_Portal_Metrics anyway and all widgets (except the Resource Types Monitored pie chart) now display "Instance not found".   Here's the steps I took:

  • upgraded Logicmonitor_Portal_Metrics, deleted LM_Device_Count
  • installed propertysource addCategory_LogicmonitorPortal
  • added properties lmaccount, lmaccess.id, lmaccess.key to a collector
  • upgraded that collector to 29.104

There's a piece of the puzzle I am missing here, probably pretty obvious, but I haven't been able to figure out what it is.  Anybody have any ideas?

thx

The widgets on the dashboard will have to be reconfigured to point to the new instance created as part of the portal metrics DS. 

Userlevel 3
Badge +8
2 hours ago, Stuart Weenig said:

The widgets on the dashboard will have to be reconfigured to point to the new instance created as part of the portal metrics DS. 

Even simpler: pay closer attention to the differences that SLM presents for resolution,  particularly "Applies To" syntax.  ?

Hey all
Wondering if someone can help.
I'm relatively new to LogicMonitor and currently enjoying looking around and finding what I can add to our implementation that we will benefit from and this looks like it could be one of them.

I have followed the instructions in the first post, however I am having the same problem that @Michael Dieter above in that only the "Resource Types Monitored" pie chart is displaying any results, the rest is displaying "Instance not found"/

I gather from Michael's last post that he was able to fix it, but I don't understand what he did to fix?
Can someone please shed some light on this for me?

Thanks,

Userlevel 3
Badge +8

@ASCoates

Sounds like you have a resource already being monitored, so note the IP address/DNS name you used and its display name.   EX: we used www.logicmonitor.com as the IP/DNS name and then used LOGICMONITOR SERVICE METRICS as the display name.

Confirm you you have also added the propertysource addCategory_LogicmonitorPortal and ensure that its Applies To matches the IP/DNS name that you are using.

Then, in another tab open up each of the Portal Monitoring datasources that you want to use.  Proceed to the Applies To configuration and replace the default value with the display name of the resource.

Return to your portal resource and select the Manage button.  The property source should have added "LogicMonitorPortal" to the value of the system.categories property.  Then you will need to add property lmaccount (value = your portal,  ie if your portal is ascoates.logicmonitor.com the value you set here is "ascoates"); add property lmaccess.id and lmaccess.key and the appropriate values for each.   NOTE: these values are generated by creating an API user which you do at Settings -->User Access -->Users and Roles -->API user

Hope you can follow this.  But I'd also say don't keep struggling if you aren't getting it...open a support case and they will definitely help.

Michael, thanks for the response.

I have gone through the above this morning, I had done the lmaccess account, id and key already but had missed the DNS / Display name stuff.

Have done that this morning and now under the Portal Metrics resource I have created I now see the additional data sources (Alerts, API Utilization etc).  Have checked the dashboard and although it is taking a few minutes to pull through, it is starting to pull through information.

Again, thanks for the detailed reply and help.

Badge

Hi,

I just imported LogicMonitor_Portal_Alerts datasource. We are still seeing alert count mismatch. We have round 20k warning alerts but in datasource is showing only 10k. It seems  the 'hard limit' that can be returned for alerts is 10000 only. Any fix or suggestion on this.

Userlevel 4
Badge +5

Hi @mraja
That doesn’t sound normal, I would recommend submitting a support ticket if you haven’t already, so an engineer can take a look on your portal and escalate to development team for further investigation if needed.
You can find the more details about the options for reaching out to support from your portal in the following article: https://www.logicmonitor.com/support/about-logicmonitor/customer-support/get-support-resources 

Reply