Forum Discussion
Anonymous
8 months agoSo the DS only has one datapoint and you'd like to set a threshold for when that datapoint goes over a certain value. However, you don't want to set it on every instance, just certain instances. Setting it per instance or per instance group isn't tenable because of the quantity of instances and instance groups. It's not reasonable to set a threshold on 10,000 instance groups (compounded by the number of different thresholds you want to set).
There are a couple ways to do this.
- You could separate the datasource into multiple datasources, each one containing all the instances that would share a threshold. So if you have instances A, B, and C all in one DS now, but A's threshold is > 2, B's is >5, and C's is >100, you'd split into three datasources, each one filtering all but one specific instance. So you'd have DS_A, DS_B, & DS_C. You would use the same discovery for all three but create discovery filters to only end up with A's in DS_A, B's in DS_B, and C's in DS_C. This is the most easy to consume because the events are clearly split out into different places each with its own threshold. The disadvantage is that you might not have just A, B, and C, you might have A-ZZZ, meaning you'd need hundreds of datasources. This is the most straightforward option.
- Another way would be to create a datapoint within your one datasource that evaluates the name of the instance along with the value of the count datapoint and returns a 1 or 0 depending on whether it's over threshold. You'd do this using the ##WILDALIAS## token. You'd still have one datasource, but there would be one datapoint per event name you want to alert on.
- Another option altogether is to pipe the data into LM Logs and setup alert pipelines for the different events containing values higher than the acceptable value. If you need to be able to graph the .count datapoint, this isn't a good option. However, if the .count datapoint is counting the number of error events per name this might be a better option. LM Logs can now threshold looking for a log to happen > X times in a certain timeframe. So you would only get an alert if the event showed up in the logs more than 6 times within an hour for example. In this case, the logs coming in are the actual error logs rather than a summary saying that such-and-such log came in N times.
There might even be other ways depending on exactly what you are trying to collect.
Related Content
- 2 years ago