Forum Discussion

jonathanbarrow's avatar
7 months ago

How do I configure an alert, for a specific Instance name?

For the life of me I can't figure out to make the type of alert our org needs.

  1. I have a datasource named "Event Log Errors v3"
  2. This has a ton of different instances being generated and it's applied to any IsWindows() systems which we have over 10,000 of.
  3. I need to set an alarm, for a specific instance name. If that instance name is seen with a value over X over X, then I want an alarm triggered, on any system.
  4. I don't want to have to configure this a bazillion times (individual resource level), the alert should be uniform for all systems. 
  5. I can't set an alarm on the DataSource itself, as there is one datapoint right now that uses ##WildValue##.COUNT as its Key, so setting a threshold there would set the same threshold for all the hundreds of instances being created by this DataSource and blow up our help desk.
  6. I can't set it at the Dynamic Group level right now because there I can pick the DataSource, but can't seem to specify a specific instance name from within the DataSource. It gives me the generic "count" entry.

What am I missing here? Any suggestions? I chatted with support but end up with a slew of doc page links that send me down rabbit holes or confuse me even more. This should be pretty simple one would think...

  • Do you have perhaps a specific example of instances (with made up data) for what you are looking to do?

    For example if you want to set a count threshold of 5 for event id 1234, 2 count for event id 4321, and 10 for everything else. One option is to create special Datapoints that only deal with those event ids. like having Datapoints called "CountIfEventId1234" and "CountIfEventId4321" that always return 0 unless you have event ids 1234/4321. You an then set global thresholds for these special datapoints separate from the more generic Count. This wouldn't be all that great if you have a ton of exceptions though.

     

    • jonathanbarrow's avatar
      jonathanbarrow
      Icon for Neophyte rankNeophyte

      You bet, here is the exact instance name we're trying to set an alarm for...

      "Microsoft-Windows-TerminalServices-RemoteConnectionManager/Admin|Microsoft-Windows-TerminalServices-RemoteConnectionManager|2|1069"

    • jonathanbarrow's avatar
      jonathanbarrow
      Icon for Neophyte rankNeophyte

      And here is a sample of what the discovery script outputs when creating the instances initially for each system. 

      When the collector script comes along, would be similar output but would be instance_name=somevalue in it's output.

       

  • Anonymous's avatar
    Anonymous

    So the DS only has one datapoint and you'd like to set a threshold for when that datapoint goes over a certain value. However, you don't want to set it on every instance, just certain instances. Setting it per instance or per instance group isn't tenable because of the quantity of instances and instance groups. It's not reasonable to set a threshold on 10,000 instance groups (compounded by the number of different thresholds you want to set).

    There are a couple ways to do this. 

    1. You could separate the datasource into multiple datasources, each one containing all the instances that would share a threshold. So if you have instances A, B, and C all in one DS now, but A's threshold is > 2, B's is >5, and C's is >100, you'd split into three datasources, each one filtering all but one specific instance. So you'd have DS_A, DS_B, & DS_C. You would use the same discovery for all three but create discovery filters to only end up with A's in DS_A, B's in DS_B, and C's in DS_C. This is the most easy to consume because the events are clearly split out into different places each with its own threshold. The disadvantage is that you might not have just A, B, and C, you might have A-ZZZ, meaning you'd need hundreds of datasources. This is the most straightforward option.
    2. Another way would be to create a datapoint within your one datasource that evaluates the name of the instance along with the value of the count datapoint and returns a 1 or 0 depending on whether it's over threshold. You'd do this using the ##WILDALIAS## token. You'd still have one datasource, but there would be one datapoint per event name you want to alert on.
    3. Another option altogether is to pipe the data into LM Logs and setup alert pipelines for the different events containing values higher than the acceptable value. If you need to be able to graph the .count datapoint, this isn't a good option. However, if the .count datapoint is counting the number of error events per name this might be a better option. LM Logs can now threshold looking for a log to happen > X times in a certain timeframe. So you would only get an alert if the event showed up in the logs more than 6 times within an hour for example. In this case, the logs coming in are the actual error logs rather than a summary saying that such-and-such log came in N times.

    There might even be other ways depending on exactly what you are trying to collect.

  • So in that code, the 1234 is the actual wildalias name you'd be looking for in your example. I wonder how that would work as I think the WildAlias in this scenario would be that long instance name we're using now, right? 

    This is the instance name we're trying to alarm off of.

    "Microsoft-Windows-TerminalServices-RemoteConnectionManager/Admin|Microsoft-Windows-TerminalServices-RemoteConnectionManager|2|1069"

    • Anonymous's avatar
      Anonymous

      Yes, 1234 in the expression above would be the whole name. And the 5 would be the value of the .count datapoint you want to threshold on. ge() means greater than or equal to. This assumes the datapoint is called "count".

      I think the problem with this is that the eq() function is looking for numbers, not a string. So it may always return 0. If that's the case, we may need to look into simplifying either the wildvalue (first part of discovery line before the ##) or the wildalias (second part of discovery line, the instance display name) down to a number. 

      In your name above is 1069 enough to uniquely identify that one instance? Or is it possible to have a 1069 with two different strings in the name before the 1069?

      • Anonymous's avatar
        Anonymous

        For example, is it possible for these both to exist? Or will "1069" always be paired with "Microsoft-Windows-TerminalServices-RemoteConnectionManager/Admin|Microsoft-Windows-TerminalServices-RemoteConnectionManager|2"

        "Microsoft-Windows-TerminalServices-RemoteConnectionManager/Admin|Microsoft-Windows-TerminalServices-RemoteConnectionManager|2|1069"

        "Something else entirely|2|1069"

  • Jonathan, You can create the dynamic group, and then go to settings-->Alert Rules and create rules on a specific resource or instance. 

     

  • Thank you, I've toyed with that a bit but haven't been able to figure out how I can set a threshold specifics with that alert rule. I have the logic module picked, instance picked, and datapoint picked, but there is no setting for a value. Will it always trigger if there is any value, or can i set it if say it comes back over X, then it triggers?

  • Thank you so much, the #2 option sounds ideal but I'm not sure how to reference the ##WILDALIAS## token within the new datapoint I'm trying to create. Here is how the regular datapoint looks now, that's used for all the instances on the DS. I guess I'd need to create a new, one, but have it only report data for one specific instance name. Is the Key section where I would be using the alias you mentioned?

    • Anonymous's avatar
      Anonymous

      You'd need to build a new complex datapoint, not a normal datapoint. Normal datapoints extract data from the output of the script/task. Complex datapoints evaluate normal datapoints and/or tokens. So you'd create a complex datapoint that looks something like this (YMMV):

      if(and(eq(##WILDALIAS##,1234),ge(count,5)),1,0)

      You might run into issues using the wildalias if it's not numeric, LM is tricky about which things the eq() function will evaluate.