Forum Discussion

Keimond's avatar
Keimond
Icon for Neophyte rankNeophyte
7 years ago

cluster alert improvements

Hello, we would love the ability to have the cluster alert give us a list of instance names that have triggered the cluster alert.

Another is filtering based off of an instance name.

Example, let's say I have 3 devices that all have 3 instances named something like

Device1 ---		Device2 ---			Device3
a1-dns1.blah      	d1-dns1.blah			g2-dns1.blah
b6-dns2.blah		e2-dns2.blah			h1-dns2.blah
c1-dns3.blah		f1-dns3.blah			i2-dns3.blah

While you can see all the instances are a different name, the last part of the name is still common with some of the others (dns1.blah, dns2.blah, dns3.blah)  I would like the ability to say trigger the alert if 2 or more regex groups match.. ie..
[a-z][0-9](dns[0-9].blah)

  • Sarah_Terry's avatar
    Sarah_Terry
    Icon for Product Manager rankProduct Manager

    Hi @Joe Tran and @Keimond,

    We're working on a feature that will enable much more flexible alerting for instances across devices. At a high level, we'll allow you to group together instances across devices and set cluster-like thresholds for the group. So to make it based on instance names, you'd group together instances based on name (can be across multiple devices) & set aggregate thresholds for each group.

    Can you provide more detail re what you're alerting on in these cases? 

    Thanks!

    Sarah

  •  Server1			Server2
    uy1 - dns1.us1.blah.com        ab1 - dns3.us1.blah.com
    uy1 - dns2.us1.blah.com        ab1 - dns4.us1.blah.com
    ar1 - dns1.us1.blah.com        mn1 - dns3.us1.blah.com
    ar1 - dns2.us1.blah.com        mn1 - dns4.us1.blah.com
    my1 - dns3.us1.blah.com        bg2 - dns1.us1.blah.com
    my1 - dns4.us1.blah.com        bg2 - dns2.us1.blah.com

    All instances return back a value for a query time (from Server1/2 to dns1/2/3/4.us1.blah.com.. whichever is in the name..)
    All instances return back a value of 0 1 or 2 based on if they were able to query both servers.. so for example, if uy1 can't query either dns1 or dns2, that's a critcal because that site can't query either server.

    So I see two possible groupings.. the sites (uy1, ar1, my1, ab1, mn1, bg2)
    or the servers (dns1.us1.., dns2.us1.., dns3.us1.., dns4.us1..)

    So instead of each of my instances having to do their own scripted checking in the background to check both servers, a grouped/cluster alert could alert if both instances are down.

    And in the other case, if say dns1.us1.blah.com is down... I'd rather not get a page from 
    uy1, ar1, and bg2... just one page with a custom alert saying that uy1, ar1, and bg2 are unable to contact dns1.us1.blah.com (I envision this being possible by pulling instance properties)

    example: uy1 - dns1.us1.blah.com would have instances properties of
    site = uy1
    dns = dns1.us1.blah.com

    Flexibility in being able to customize alert messages for each cluster / groupped alert is a big one too !! Right now it's a standard template across the board :(
     

  • Hi @Sarah Terry,

    I'm basically replicating internal LM Website checks, but with Devices due to a dev requirement to capture and track specific JSON object values in the response. All of the cluster devices are pointed to the same host/IP/DNS but assigned to different collectors. Each instance in the template is a unique, largely independent web app environment.

    Another use case we have includes monitoring specific Windows Services. All monitored services exist in a cluster-like offering and are named the same, but the service name does change based on the version. I elected to use a single Active Discovery template so I'm not constantly updating a datasource just because the delivery teams pushed a new update.

    Instance grouping and subsequently defining alert clusters at the instance group-level across devices would be ideal for both of my use cases.