Forum Discussion

ColP's avatar
ColP
Icon for LM Champion rankLM Champion
2 years ago

When an anomaly isn't an anomaly what could i do?

What can i do when anomaly detection wont work ( something that is seen on a regular basis, and dynamic threshold also wont help where it is within range?

For example a drive on a server gets filled with data ( drive is normally cleared down on a daily basis ) but when someone decides to upload a larger than expected amount the drive hasn't been cleared or with other uploads throughout the day  there isn't enough space.

You are happy if the drive is above 80% during the night because if it hasn't cleared it can be dealt with in the morning ( no need to get anyone out of bed ) but if there is a rapid spike ( more than 2.5% growth in used space in a 30min period ) then they need an alert to get out of bed and fix / make enough room for the data.

A possible solution is a datasource that will alert if the drive is over the 80% but only with that rapid growth.

DataSource calls the api for the last 30min worth of data and calculates the growth rate.

The below is the code for a C drive but the drive letter can be changed easily in the code below, same with the 2.5% and the 80% values, they could also be parameterised for different ranges on different devices.

<# Use TLS 1.2 #>
[Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12

<# account info #>
$accessId = '##apiaccessid.key##'
$accessKey = '##apiaccesskey.key##'
$company = '##company##'
$deviceId = "##system.deviceId##"

<# request details #>
$httpVerb = 'GET'
$resourcePath = "/device/devices/$deviceId/devicedatasources"
$queryParams = '?filter=dataSourceName:"WinVolumeUsage-"'

<# Construct URL #>
$url = 'https://' + $company + '.logicmonitor.com/santaba/rest' + $resourcePath + $queryParams

<# Get current time in milliseconds #>
$epoch = [Math]::Round((New-TimeSpan -start (Get-Date -Date "1/1/1970") -end (Get-Date).ToUniversalTime()).TotalMilliseconds)

<# Concatenate Request Details #>
$requestVars = $httpVerb + $epoch + $data + $resourcePath

<# Construct Signature #>
$hmac = New-Object System.Security.Cryptography.HMACSHA256
$hmac.Key = [Text.Encoding]::UTF8.GetBytes($accessKey)
$signatureBytes = $hmac.ComputeHash([Text.Encoding]::UTF8.GetBytes($requestVars))
$signatureHex = [System.BitConverter]::ToString($signatureBytes) -replace '-'
$signature = [System.Convert]::ToBase64String([System.Text.Encoding]::UTF8.GetBytes($signatureHex.ToLower()))

<# Construct Headers #>
$auth = 'LMv1 ' + $accessId + ':' + $signature + ':' + $epoch
$headers = New-Object "System.Collections.Generic.Dictionary[[String],[String]]"
$headers.Add("Authorization",$auth)
$headers.Add("Content-Type",'application/json')
$headers.Add("X-Version","3")

<# Make Request #>
$response = Invoke-RestMethod -Uri $url -Method $httpVerb -Header $headers

<# Get Device DataSource ID #>
$deviceDataSourceId = $response.items.id

<# request details #>
$httpVerb = 'GET'
$resourcePath = "/device/devices/$deviceId/devicedatasources/$deviceDataSourceId/data"
$queryParams = ''

<# Construct URL #>
$url = 'https://' + $company + '.logicmonitor.com/santaba/rest' + $resourcePath + $queryParams

<# Get current time in milliseconds #>
$epoch = [Math]::Round((New-TimeSpan -start (Get-Date -Date "1/1/1970") -end (Get-Date).ToUniversalTime()).TotalMilliseconds)

<# Concatenate Request Details #>
$requestVars = $httpVerb + $epoch + $data + $resourcePath

<# Construct Signature #>
$hmac = New-Object System.Security.Cryptography.HMACSHA256
$hmac.Key = [Text.Encoding]::UTF8.GetBytes($accessKey)
$signatureBytes = $hmac.ComputeHash([Text.Encoding]::UTF8.GetBytes($requestVars))
$signatureHex = [System.BitConverter]::ToString($signatureBytes) -replace '-'
$signature = [System.Convert]::ToBase64String([System.Text.Encoding]::UTF8.GetBytes($signatureHex.ToLower()))

<# Construct Headers #>
$auth = 'LMv1 ' + $accessId + ':' + $signature + ':' + $epoch
$headers = New-Object "System.Collections.Generic.Dictionary[[String],[String]]"
$headers.Add("Authorization",$auth)
$headers.Add("Content-Type",'application/json')

<# Make Request #>
$response = Invoke-RestMethod -Uri $url -Method $httpVerb -Header $headers

<# Print status and body of response #>
$status = $response.status
$body = $response.data | ConvertTo-Json -Depth 5

function Select-Nth {
param([int]$N)

$Input | Select-Object -First $N | Select-Object -Last 1
}

$array1 = @($response.data.instances.'WinVolumeUsage-C:\'.values)

$first = $array1[0] | Select-Nth 3
$last = $array1[19] |Select-Nth 3

$growth = $first - $last


if (($growth -gt 2.5) -and ($first -ge 80)){
return 1
}else {
return 2
}

Hope this gives you some ideas to develop alerting further đŸ˜

  • Possibly more simply (no need to call the API) but with possibly less finesse, you could consider using complex datapoints and a simple delta threshold. You can extend this using an instance or resource property for further control.

    For example, a complex datapoint for alerting on a substantial change in the % used value, but that you only care about if you’re already over 80%, could be:

    Datapoint iCareAboutThisDelta, with expression:

    if(ge(PercentUsed,80),PercentUsed,80)

    A delta alert threshold will never trigger on this if PercentUsed is below 80, because this datapoint will return a flat 80 in those conditions.

    You could extend this with a further complex datapoint that brought in the value of a resource or instance property if you wanted to set different base values - let’s call it deltaFloor - or the wildalias or wildvalue if you wanted to target all C drives (etc).

    E.g. a groovy scripted datapoint might contain:

    resourceProp_value = hostProps.get('propertyName');

    if (resourceProp_value == '') {return 80} else {return resourceProp_value}

    instanceProps.get(‘’) will work for instance level properties in a similar manner, so you could return a 1 or 0 dependent on wildvalue or wildalias being or containing a certain string (such as C, D, etc), to determine whether the final complex datapoint, on which the delta threshold would be set, returns a value at all, whether it’s a flatline, or whether it’s a moving number in the range you care about.

    Obviously, no good for meaningful graphing, but potentially useful for changes within a range of concern.

    You are still limited to a delta threshold being from one poll to the next, rather than over multiple polls across half an hour, but in some cases this may be adequate, and then you don’t need to be loading up the collector and the API (remembering the rate limit for your call is 500 calls / minute, so if you have >1,000 servers and are comparing historic data every 2 minutes, you’ll almost certainly see some 429 codes and consequent data gaps).

    Other than that, I love your ingenuity.

  • Background: I did my Senior thesis for my Anthropology (Archaeology) degree on generating pre-historic soil topology based on minimal measured data.

    One of the pieces that I was working on to supplement that effort was to try to recreate the long, mid, and smaller frequency height fluctuations for those ancient surfaces.  This effort consisted of measuring average and standard deviations of the three.  Using these 6 data points, I posit there should be the ability to get a weekly, daily and hourly variation values and the sums of those three at any point should fall within a known range.  You’re already accessing that dataset according to your code for volume freespace.

    I will be utilizing parts of your code to calculate growth and days until full (grabbing those historical values had eluded me for some reason).

    Counter and Derive datapoint types can be very valuable for these types of metrics as well for very short term shifts (just previous value).

    You can also set different values in the static thresholds for different times of day to help account for things like backups, etc.

    Enough of my rambling… now I have to go write some new dataSources based on this discussion.