Forum Discussion

Antony_Hawkins's avatar
3 years ago

ConfigSource Checker PropertySource

What:

An API-calling PropertySource to mark whether devices have (or should have) ConfigSources on them.

Why:

There is no canned report nor convenience function to provide a list of which resources in your portal have ConfigSources applied to them, nor which of those resources have instances successfully discovered (or not). It's also difficult to find, other than via the API, devices which should have ConfigSource instances but which do not, which will happen if e.g. you have applied incorrect ssh credentials to suitable network resources. This PropertySource enables all of this. (Note: This will highlight failing instance discovery for ConfigSources, it will not tell you anything about config collection for instances - but ConfigSources themselves can and do do this as a ConfigCheck).

How:

This PropertySource uses the LogicMonitor API, and specifically a call to API endpoint:

/device/devices/{deviceID}/devicedatasources

...to list all ConfigSources (dataSourceType:"CS") on a resource.

Happily that call lists all applied LogicModules, even if they have zero instances.

Given ConfigSources (almost always) apply only to their specific technology, where relevant credentials are set, a zero-instance ConfigSource on a device is a pretty solid indication of a problem with instance discovery - usually, incorrect credentials or inadequate access.

You will need:

LogicMonitor API credentials set as resource properties appropriately to cover any resources to be tested; usually this would therefore be fairly high up the resource tree.

The script will accept lmaccess.id, logicmonitor.access.id, apiaccessid.key for the API token ID; and lmaccess.key, logicmonitor.access.key, or apiaccesskey.key for the token key (in those orders of preference).

The script will take the account name directly from collector settings.

Expected outputs:

This PropertySource will add at least these auto.properties to resources:

auto.configsources_check_datetime : [human readable time of last check]auto.configsources_check_epoch : [epoch time of last check]auto.configsources_check_result : [success|failure]auto.configsources_is_config_device : [yes|no|undetermined]

The auto.configsources_is_config_device property can then be used to create dynamic groups to contain all config resources (and problematic config resources), brought into inventory reports, etc.

'Undetermined' means that the API call has never worked on this resource, while 'yes' and 'no' indicate the presence or absence of any number of applied ConfigSources.

The auto.configsources_check_result output of 'success' or 'failure' relates to the API call, as in, did we get a 200 code and therefore data, or a failure code. In the event of a failure, if there was a previous success, the previous determination as to whether this is or isn't a config resource (auto.configsources_is_config_device) will be maintained on the resource, so a transient problem wont delete this knowledge.

If the check is successful but there are no applied ConfigSources:

No other properties will be added.

If the check is successful and there are one or more applied ConfigSources:

auto.configsources_applied : [csv list of ConfigSource names applied]auto.configsources_applied_count : [count of ConfigSources applied]auto.configsources_active : [csv list of ConfigSource names with instances]auto.configsources_active_count : [count of ConfigSources with instances]

...plus, if any of the Applied ConfigSources have zero instances:

auto.configsources_missing : [csv list of ConfigSource WITHOUT instances]auto.configsources_missing_count : [count of ConfigSource WITHOUT instances]

If the check is unsuccessful due to absent API credentials:

auto.configSources_api_credentials_absent : [1|2|3 - count of missing credentials, from apiID, apiKey, accountName]

If the check is unsuccessful due to an API problem:

auto.configSources_api_code : [api response code]auto.configSources_api_retries : [count or retries]

The API code may be e.g. 401 if incorrect credentials are provided, or 429 if API rate limiting has been hit. The latter may occur if the PropertySource is deployed to thousands of resources simultaneously, but over time, as PropertySources run daily only (or when active discovery is demanded for a resource), it is unlikely that rate limiting will be a persistent problem. The script has an incremental delay stand-off routine to retry the call in the event of a 429 to further mitigate this risk.

These results may look like:

ca98f8cb716fbc69b0ccfc8e91fa289f.png

...or...

4c0f1b4b236e1b0fcae46564bf77311e.png

...or...

3bf56a03608b1131d16af6a22a67fe52.png

...or...

6937a09e520c086aa2c7bf8047bb02ad.png

Version 1.4 is published with locator: 6TLCJH

  • NB. If you want to alert on any of these properties, e.g. a non-zero count of failing ConfigSources, or a change to the number of applied or active ConfigSources, that's a ridiculously simple DataSource to pull in the property values and return them as datapoints. I'll leave this as an exercise for the reader.

  • ...and, OK, yes you could drop this script code directly into a DataSource, and plot those numbers more frequently, but if you do this I suggest a reasonably infrequent polling interval, e.g. 10 minutes (maybe longer), particularly in larger environments, to avoid the risk of hitting API rate limits.

  • Thanks for this @Antony Hawkins I can see how this can be useful for LM users. I have had at least 1 customer ask me if a certain device would have configs. I’m going to bookmark this so I can share with customers.

  • Update:

    Version 1.4 is published with locator: 6TLCJH

    Minor bugfix, such that the rate limit retry function actually retries on a 429 response. I’d managed to break it previously, with a typo.