Ideas to maintain thresholds across thousands of devices and even more instances.
I have an ask to standardize some thresholds based on device type. We break out folders out like so:
Clients/CLIENTID/Location/Techstack
We then have roll up folders under each client for reporting/automation
Clients/CLIENTID/ZZZ_Firewalls
The ask brought to myself and my team is to standardize thresholds based on techstack. The first one I was given is that all Firewalls should have status and X% for in/out on Interfaces.
My first idea was to script it and use our automation platform to run daily or every other day to check if a threshold is set, and if not, set it. The issue there is on the initial dry run, I am at over 11 hours and maybe 40% through. This script loops through each client ZZZ_Firewalls folder, gets the interfaces, then checks if a threshold is set. If not, it sets the base threshold. I figured I would break it in half A-N and M-Z and run both concurrently. But in testing M-Z, 1 client is taking almost 10 hours, and I have another of the same size in that block. I also have 1-3 clients of similar size in A-N.
Second idea was to create a dynamic group that encompasses all of the /Firewalls/ devices and set the threshold on that folder. But I was leery of that as we could end up with oddities on deepest folder wins.
We also cannot simply edit the datasource as this datasource applies to much more than just Firewalls. Any ideas would be appreciated.
I understand.
Typically, the deepest folder would control the threshold, since that threshold would be “closer” to the device. I think of the threshold inheritance a little like group policies, where the last one that is applied is the one that takes effect, so the closest to the device( or deeper in the resource tree) is the winner.
However, if a threshold is set somewhere in the clients tree (Clients/ACMECO/Chicago/Firewalls) and a more ‘global’ threshold set at a higher level group (Thresholds/Firewalls) then the one at the customer level will still win.
If two groups that have thresholds configured and are at the same level of the device hierarchy, then the group that was created first wins. This can be determined by lowest group id.
https://www.logicmonitor.com/support/alerts/about-alerts/tuning-alert-thresholds#
When looking at a specific device, you should also be able to see what threshold is applied to a datapoint and from where/what group on the alert tuning tab.
Is the goal to override anything set at deeper customer levels, or to make sure those deeper level threshold are maintained?