This request doesn't appear to fit the other 2 I found so... Right now I have 3 (soon 4) host servers in a Hyper-V cluster called "hvcluster". I get 3 alerts for each cluster shared volume and each guest VM not running. Other vendors products I use are capable of only displaying a single instanced alert for "the cluster" and I'm hoping you guys can implement this as well. Right now these are driving me nuts especially when we have several guest servers off on purpose for extended to forever periods of time. No SDT doesn't work it still shows them. Right now to only get 1 alert I have to leave 1 host server as the alerting server and then manually edit the alerts for every host server I don't want alerting to disable them. I have to do this every time I add a new guest or CSV and it is super annoying. Let me know if you need more details.

There actually is a solution for this now. See https://www.logicmonitor.com/lmservice-insight/ . Unfortunately, though this use case reflects a missing core feature that is missing, the solution is a premium license addon. I have had words with our CSM about this stance. They need a core version and an enhanced version, IMO. The other problem with it is it is "write only". If you decide to change an instance after it is created, the wizard that creates them is not available (last time I checked anyway). But it should fix this problem.

Agree this is a point of pain for us too. If a VM or cluster resource has an issue LM (rightly?!) reports alarms from every node in the cluster. A mechanism to dedupe alarms from cluster resources would be appreciated.

It's been a year since this thread was opened... any traction on a resolution or workaround for this? The documentation that mentions this doesn't seem to answer the need here: https://www.logicmonitor.com/support/monitoring/os-virtualization/windows-cluster-monitoring/

I'm working on creating a discovery script that will ID and tag members of a HyperV cluster, then tag the cluster with a GUID that will allow a Clustered Volumes dynamic group to relate the Volumes from them. Then we can hit escalation chains using a cluster alert on that dynamic group rather than on the individual instances. I'll post it here when I'm done with it. It'll be a powershell solution.

I'm cross posting this from a href="https://communities.logicmonitor.com/topic/1097-collapse-clustered-instances-at-group-level/" rel="">https://communities.logicmonitor.com/topic/1097-collapse-clustered-instances-at-group-level/ Here was my solution: Add Custom Property to each device with the cluster ID (or an arbitrary GUID/unique naming convention instance). We used HyperV.Cluster.GUID = <GUID generated by powershell with New-GUID> Use that to build a dynamic group for each cluster Use that group to allow for cluster alert for which ever sources you're trying to get visibility into, then tell it not to trigger those source alerts for the individual members of the group. To help reduce the number of repeated emails generated by the escalation chains, add a blank entry to the end of the escalation chain you're using. The escalation increments through the steps after the escalation time has passed, then repeats the last step once it gets to the end. That can produce far more emailed alerts than are necessary. Adding a blank last step causes the escalation to repeat that after making it through the chain. You can add timing pauses into your escalation chain as well by adding blank steps between. If the alert closes during that time, the rule ends and the following steps don't fire. For instance, in Dynamics AX, the AOS servers can take a long time to come up. Our team needs to know that it's gone down. Our customer needs to know if it doesn't com up. Rather than making a pair of sources that we can then create separate alerts for, We make the first step the contact to our team, then enough blanks to account for the time it normally takes for the service to restart. After that, we add the customer's team, then a blank. Our current ticketing system only accepts automated tickets via email. These little tricks allow us to have tickets only generated once per alert, and allow us to make sure we're not panicking our customers for failovers / failures that are recovering as expected.

more cluster alert improvement requests

9 Replies

Replies have been turned off for this discussion

mnagel
Professor
7 years ago
There actually is a solution for this now. See https://www.logicmonitor.com/lmservice-insight/ . Unfortunately, though this use case reflects a missing core feature that is missing, the solution is a premium license addon. I have had words with our CSM about this stance. They need a core version and an enhanced version, IMO.

The other problem with it is it is "write only". If you decide to change an instance after it is created, the wizard that creates them is not available (last time I checked anyway).

But it should fix this problem.
Dave_Smale
8 years ago
Agree this is a point of pain for us too. If a VM or cluster resource has an issue LM (rightly?!) reports alarms from every node in the cluster.

A mechanism to dedupe alarms from cluster resources would be appreciated.
Cole_McDonald
Professor
7 years ago
It's been a year since this thread was opened... any traction on a resolution or workaround for this? The documentation that mentions this doesn't seem to answer the need here: https://www.logicmonitor.com/support/monitoring/os-virtualization/windows-cluster-monitoring/
Cole_McDonald
Professor
7 years ago
I'm working on creating a discovery script that will ID and tag members of a HyperV cluster, then tag the cluster with a GUID that will allow a Clustered Volumes dynamic group to relate the Volumes from them. Then we can hit escalation chains using a cluster alert on that dynamic group rather than on the individual instances. I'll post it here when I'm done with it. It'll be a powershell solution.
Cole_McDonald
Professor
7 years ago
I'm cross posting this from a href="https://communities.logicmonitor.com/topic/1097-collapse-clustered-instances-at-group-level/" rel="">https://communities.logicmonitor.com/topic/1097-collapse-clustered-instances-at-group-level/ Here was my solution:

Add Custom Property to each device with the cluster ID (or an arbitrary GUID/unique naming convention instance). We used HyperV.Cluster.GUID = <GUID generated by powershell with New-GUID>

Use that to build a dynamic group for each cluster

Use that group to allow for cluster alert for which ever sources you're trying to get visibility into, then tell it not to trigger those source alerts for the individual members of the group.

To help reduce the number of repeated emails generated by the escalation chains, add a blank entry to the end of the escalation chain you're using. The escalation increments through the steps after the escalation time has passed, then repeats the last step once it gets to the end. That can produce far more emailed alerts than are necessary. Adding a blank last step causes the escalation to repeat that after making it through the chain. You can add timing pauses into your escalation chain as well by adding blank steps between. If the alert closes during that time, the rule ends and the following steps don't fire.

For instance, in Dynamics AX, the AOS servers can take a long time to come up. Our team needs to know that it's gone down. Our customer needs to know if it doesn't com up. Rather than making a pair of sources that we can then create separate alerts for, We make the first step the contact to our team, then enough blanks to account for the time it normally takes for the service to restart. After that, we add the customer's team, then a blank.

Our current ticketing system only accepts automated tickets via email. These little tricks allow us to have tickets only generated once per alert, and allow us to make sure we're not panicking our customers for failovers / failures that are recovering as expected.
Cole_McDonald
Professor
7 years ago
Cross-posting my partial solution again:

For Failover Clusters in Windows Server, here's a powershell propertySource that will add a category to members of clusters. It's not the complete solution, but it'll allow you to better target eventSources to eliminate some of the strain on your Collectors:

Name: Windows Failover Cluster Discovery Group: Windows Failover Cluster appliesTo: isWIndows() if ( test-path "\\##system.displayname##\C$\Windows\Cluster\CLUSDB" ) { "system.categories=ClusterMember" }

It doesn't currently identify owners (although that can probably be done with an invoke-command { test-path "hklm:\0.Cluster" } as the owner of the cluster gets the extra Registry Cluster Hive copied during the failover. (https://blog.workinghardinit.work/2016/03/29/the-cluster-and-0-cluster-registry-hives/).

I have powershell that will identify Clusters by name and GUID and find the Nodes associated with them. I've got to figure out how to get that to create dynamic groups and populate them using properties. I'm most likely going to have to figure out how to make it be a timed script 1-2 times a day that then scans for cluster members and performs a bunch of queries using the failovercluster PS Module commandlets. Then use the REST API to create the associated structures to get them grouped so we can add clustered alerts to them and turn off their individual alerts.

Then I can apply the same kind of logic to Hyper-Visor clustering. I'm slowly chipping away at this to make it fit our business model's use case.
Cole_McDonald
Professor
7 years ago
Can't edit the previous post... so here's a quick update: I lied. I had thought HyperV clusters and SQL clusters used a different mechanism for their respective clustering. They don't, they use the same mechanism. This works for hyperV as well. To make it work with vCenter, you'll just need to identify an identifying piece. I am using the existence of the Cluster DB file.

Cole_McDonald

Professor

7 years ago

I completed my dynamic cluster discovery:

# Change this to match the ID of your parent folder for your clusters
$groupParentID      = "566"

# Change this to your company name from your LM URL
$company            = "your_company_name"

$URLRoot            = "https://$company.logicmonitor.com/santaba/rest"
$server             = "##system.displayname##"

$accessID           = "##LogicMonitor.accessId.key##"
$accessKey          = "##LogicMonitor.accessKey.key##"

function Send-Request {
    param (
        $cred,
        $accessid   = $null,
        $accesskey  = $null,
        $URL               ,
        $data       = $null,
        $version    = '2'  ,
        $httpVerb   = "GET"
    )

    if ( $accessId -eq $null) {
        $accessId   = $cred.UserName
        $accessKey  = $cred.GetNetworkCredential().Password
    }

    # Use TLS 1.2
    [Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12

    # Get current time in milliseconds
    $epoch          = [Math]::Round(
        ( New-TimeSpan `
            -start (Get-Date -Date "1/1/1970") `
            -end (Get-Date).ToUniversalTime()).TotalMilliseconds
        )

    # Concatenate Request Details
    $requestVars    = $httpVerb + $epoch + $data + $resourcePath

    # Construct Signature
    $hmac           = New-Object System.Security.Cryptography.HMACSHA256
    $hmac.Key       = [Text.Encoding]::UTF8.GetBytes( $accessKey )
    $signatureBytes = $hmac.ComputeHash( [Text.Encoding]::UTF8.GetBytes( $requestVars ) )
    $signatureHex   = [System.BitConverter]::ToString( $signatureBytes ) -replace '-'
    $signature      = [System.Convert]::ToBase64String( [System.Text.Encoding]::UTF8.GetBytes( $signatureHex.ToLower() ) )

    # Construct Headers
    $auth           = 'LMv1 ' + $accessId + ':' + $signature + ':' + $epoch
    $headers        = New-Object "System.Collections.Generic.Dictionary[[String],[String]]"
    $headers.Add(     "Authorization", $auth              )
    $headers.Add(     "Content-Type" , 'application/json' )

    # uses version 2 of the API
    $headers.Add(     "X-version"    , $version           )

    # Make Request
    $response       = Invoke-RestMethod `
        -Uri           $URL      `
        -Method        $httpVerb `
        -Body          $data     `
        -Header        $headers

    $result         = $response

    Return $result
}

if ( test-path "\\$server\C`$\Windows\Cluster\CLUSDB" ) {
    
    "system.categories=ClusterMember"   
    
    $clusterInfo    = invoke-command `
        -ComputerName $server        `
        -scriptBlock  {
            Import-Module failoverclusters
            $cluster = get-cluster
            "$($cluster.name):$($cluster.id)"
        }

    $clustername    = ($clusterinfo -split ':')[0]
    $clusterid      = ($clusterinfo -split ':')[1]

    $groupName      = "Failover Cluster - $clustername"

    # Read Groups
    # Construct URL
    $resourcePath   = "/device/groups"
    $url            = $URLRoot + $resourcePath

    # Make Request
    $response       = Send-Request  `
        -accessid     $accessID     `
        -accesskey    $accessKey    `
        -URL          $url         
        
    $group   = $response.items | ? name -eq $groupName
    
    if ( ($group | measure-object).count -gt 0 ) {

        # "*** Group Already exists.  Need Device properties? ***"
        try {
            $resource = "##Auto.Failover.Cluster.GUID##"
        } catch {
            $resource = $null
        }

        if ( $resource -ne $clusterid ) {
            # Add Properties
            "Failover.Cluster.GUID=$clusterid"
        }

    } else {
        # "*** create group & tag resource ***"

        # Construct URL
        $resourcePath = "/device/groups"
        $url          = $URLRoot + $resourcePath

        # Construct Data Body
        $data = `
@"
{
    `"name`"             : `"$groupName`"                                     ,
    `"parentId`"         : `"$groupParentID`"                                 ,
    `"disableAlerting`"  : `"true`"                                           ,
    `"enableNetflow`"    : `"false`"                                          ,
    `"appliesTo`"        : `"Auto.Failover.Cluster.GUID == \`"$ClusterID\`"`" ,
    `"customProperties`" : [{
        `"name`"         : `"Auto.Failover.Cluster.ParentGUID`"               ,
        `"value`"        : `"$ClusterID`"
    }]
}
"@

        try {
            $response       = Send-Request  `
                -accesskey    $accessKey    `
                -accessid     $accessId     `
                -URL          $url          `
                -data         $data         `
                -httpVerb     "POST"
            
        } catch {
            $error[0] | out-file $logPath -append
        }

        # Add Properties
        "Failover.Cluster.GUID=$clusterid"
    }
}

Cole_McDonald

Professor

7 years ago

Previous had an issue with the properties it was adding. If you're creating a new property from a proertySource script, it adds it as an "auto.*" property, which goes away as soon as the script stops processing. To add a new permanent custom property, you have to use the REST API, not just a "category.name=data" output from the script. Here's the final:

# These first two lines will need to change to fit your environment.
# The groupParentID is the id of a group to house the dynamic groups that will be created...
# if that's you root level, use that ID.
# We're using a group named "Failover Clusters" in our heirarchy to house them.

#######
# Cole McDonald - Sr. Technical Analyst
# cole.mcdonald@beyondimpactllc.com
# Beyond Impact 2.0, llc
# No warranty provided for this code, use at your own risk
#######

$company            = "Your_Company_Name"
$groupParentID      = "566"

$URLRoot            = "https://$company.logicmonitor.com/santaba/rest"

$server             = "##system.displayname##"
$accessID           = "##LogicMonitor.accessId.key##"
$accessKey          = "##LogicMonitor.accessKey.key##"

function Send-Request {
    param (
        $cred,
        $accessid   = $null,
        $accesskey  = $null,
        $URL               ,
        $data       = $null,
        $version    = '2'  ,
        $httpVerb   = "GET"
    )

    if ( $accessId -eq $null) {
        $accessId   = $cred.UserName
        $accessKey  = $cred.GetNetworkCredential().Password
    }

    # Use TLS 1.2
    [Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12

    # Get current time in milliseconds
    $epoch          = [Math]::Round(
        ( New-TimeSpan `
            -start (Get-Date -Date "1/1/1970") `
            -end (Get-Date).ToUniversalTime()).TotalMilliseconds
        )

    # Concatenate Request Details
    $requestVars    = $httpVerb + $epoch + $data + $resourcePath

    # Construct Signature
    $hmac           = New-Object System.Security.Cryptography.HMACSHA256
    $hmac.Key       = [Text.Encoding]::UTF8.GetBytes( $accessKey )
    $signatureBytes = $hmac.ComputeHash( [Text.Encoding]::UTF8.GetBytes( $requestVars ) )
    $signatureHex   = [System.BitConverter]::ToString( $signatureBytes ) -replace '-'
    $signature      = [System.Convert]::ToBase64String( [System.Text.Encoding]::UTF8.GetBytes( $signatureHex.ToLower() ) )

    # Construct Headers
    $auth           = 'LMv1 ' + $accessId + ':' + $signature + ':' + $epoch
    $headers        = New-Object "System.Collections.Generic.Dictionary[[String],[String]]"
    $headers.Add(     "Authorization", $auth              )
    $headers.Add(     "Content-Type" , 'application/json' )

    # uses version 2 of the API
    $headers.Add(     "X-version"    , $version           )

    # Make Request
    $response       = Invoke-RestMethod `
        -Uri           $URL             `
        -Method        $httpVerb        `
        -Body          $data            `
        -Header        $headers

    $result         = $response

    Return $result
}

if ( test-path "\\$server\C`$\Windows\Cluster\CLUSDB" ) {
    
    "system.categories=ClusterMember"   
    
    $clusterInfo    = invoke-command `
        -ComputerName $server        `
        -scriptBlock  {
            Import-Module failoverclusters
            $cluster = get-cluster
            "$($cluster.name):$($cluster.id)"
        }

    $clustername    = ($clusterinfo -split ':')[0]
    $clusterid      = ($clusterinfo -split ':')[1]

    $groupName      = "Failover Cluster - $clustername"

    # Read Groups
    # Construct URL
    $resourcePath   = "/device/groups"
    $url            = $URLRoot + $resourcePath

    # Make Request
    $response       = Send-Request  `
        -accessid     $accessID     `
        -accesskey    $accessKey    `
        -URL          $url         
        
    $group   = $response.items | ? name -eq $groupName
    
    if ( ($group | measure-object).count -gt 0 ) {

        # "*** Group Already exists.  Need Device properties? ***"
        try {
            $resource = "##Failover.Cluster.GUID##"
        } catch {
            $resource = $null
        }

        if ( $resource -ne $clusterid ) {
            # Add Properties
            # Construct URL
            $resourcePath   = "/device/devices/##system.deviceid##/properties/"
            $url            = $URLRoot + $resourcePath

            # Construct Data Body
            $data = `
@"
    {
        `"type`"         : `"custom`"                ,
        `"name`"         : `"Failover.Cluster.GUID`" ,
        `"value`"        : `"$ClusterID`"
    }
"@
            $response       = Send-Request  `
                -accesskey    $accessKey    `
                -accessid     $accessId     `
                -URL          $url          `
                -data         $data         `
                -httpVerb     "POST"
        }
    } else {
        # "*** create group & tag resource ***"

        # Construct URL
        $resourcePath = "/device/groups"
        $url          = $URLRoot + $resourcePath

        # Construct Data Body
        $data = `
@"
{
    `"name`"             : `"$groupName`"                                ,
    `"parentId`"         : `"$groupParentID`"                            ,
    `"disableAlerting`"  : `"true`"                                      ,
    `"enableNetflow`"    : `"false`"                                     ,
    `"appliesTo`"        : `"Failover.Cluster.GUID == \`"$ClusterID\`"`" ,
    `"customProperties`" : [{
        `"name`"         : `"Failover.Cluster.ParentGUID`"               ,
        `"value`"        : `"$ClusterID`"
    }]
}
"@

        try {
            $response       = Send-Request  `
                -accesskey    $accessKey    `
                -accessid     $accessId     `
                -URL          $url          `
                -data         $data         `
                -httpVerb     "POST"
            

            # Add Properties
            # Construct URL
            $resourcePath   = "/device/devices/##system.deviceid##/properties/Failover.Cluster.GUID"
            $url            = $URLRoot + $resourcePath

            # Construct Data Body
            $data = `
@"
    {
            `"type`"         : `"custom`"                ,
            `"name`"         : `"Failover.Cluster.GUID`" ,
            `"value`"        : `"$ClusterID`"
    }
"@
            $response       = Send-Request  `
                -accesskey    $accessKey    `
                -accessid     $accessId     `
                -URL          $url          `
                -data         $data         `
                -httpVerb     "PUT"
        } catch {
            $error[0] | out-file $logPath -append
        }
    }
}

Forum Discussion

more cluster alert improvement requests

9 Replies

Recent Discussions

Dashboard Sharing – An Inline Framing Method

2021-12-15 US Office Hours

Live Training - Tuning Datapoints and Alerts - 15th JUNE 2022 - APAC

Live Training - Introduction to Dashboards - 18th MAY 2022 - APAC

2022-05-11- APAC Product Overview -Collectors, Resources/Groups, Dashboards