Ideas to maintain thresholds across thousands of devices and even more instances.

Userlevel 4
Badge +9

I have an ask to standardize some thresholds based on device type. We break out folders out like so:

We then have roll up folders under each client for reporting/automation



The ask brought to myself and my team is to standardize thresholds based on techstack. The first one I was given is that all Firewalls should have status and X% for in/out on Interfaces.


My first idea was to script it and use our automation platform to run daily or every other day to check if a threshold is set, and if not, set it. The issue there is on the initial dry run, I am at over 11 hours and maybe 40% through. This script loops through each client ZZZ_Firewalls folder, gets the interfaces, then checks if a threshold is set. If not, it sets the base threshold. I figured I would break it in half A-N and M-Z and run both concurrently. But in testing M-Z, 1 client is taking almost 10 hours, and I have another of the same size in that block. I also have 1-3 clients of similar size in A-N.


Second idea was to create a dynamic group that encompasses all of the /Firewalls/ devices and set the threshold on that folder. But I was leery of that as we could end up with oddities on deepest folder wins.


We also cannot simply edit the datasource as this datasource applies to much more than just Firewalls. Any ideas would be appreciated.


Best answer by JJumpp 16 May 2023, 17:37

View original

6 replies

Userlevel 1
Badge +2

Hi Joe, 

Setting a threshold at a group folder level would really be ideal.  It eliminates the effect on other devices since you aren't setting it at the datasource, and gives you more control to set different thresholds for different customers if you need to adjust it down the line. 

It also makes it easier to manage the thresholds at group levels, than having to reach out and touch each device individually.  However, if the threshold is going to be set at the same value across all customers, then an additional dynamic group that covers all devices would be ideal.  You would end up with a single place/group to manage thresholds changes.  

LM users will often build out a separate group structure that is specifically to manage something like credentials or thresholds, so future management/manipulation is limited to that one tree, and therefore is less likely to conflict with another property value set somewhere else in the resource hierarchy.  
For your situation, you may have a top level group that is “Thresholds” and dynamic groups underneath for different device types, like “Switches” or “Firewalls” and all threshold manipulation is done there, but all the other groups are still used for things like RBAC, dashboard organization, etc.  

Do you think a strategy like this would work for you and your team?
If not, can you explain a little more about what kinds of thresholds you have deeper in your folder structure?

Userlevel 4
Badge +9

Part of the problem is we are an MSP and we fear the deepest folder controls the threshold.

Example Folder Layout

  • Clients/ACMECO/Chicago/Firewalls
  • Clients/ACMECO/ZZZ_Thresholds/Base Thresholds/Firewalls

We would need to have the folder deeper than the normal location tech stack group. Now I already have let's say 25k potential devices in our main portal. Those may already have thresholds set. So if we had a previous request for example for ACMECO to have a In/Out utilization of 60%. I then create that above folder and set it to 95%. We just overrode the threshold already established. And there isn’t a clean way of determining that.


Userlevel 1
Badge +2

I understand.  
Typically, the deepest folder would control the threshold, since that threshold would be “closer” to the device. I think of the threshold inheritance a little like group policies, where the last one that is applied is the one that takes effect, so the closest to the device( or deeper in the resource tree) is the winner. 

However, if a threshold is set somewhere in the clients tree (Clients/ACMECO/Chicago/Firewalls) and a more ‘global’ threshold set at a higher level group (Thresholds/Firewalls) then the one at the customer level will still win. 
If two groups that have thresholds configured and are at the same level of the device hierarchy, then the group that was created first wins.  This can be determined by lowest group id.

When looking at a specific device, you should also be able to see what threshold is applied to a datapoint and from where/what group on the alert tuning tab.

Is the goal to override anything set at deeper customer levels, or to make sure those deeper level threshold are maintained?

Userlevel 4
Badge +9

Ok we did some testing and the only issue we can see is the client can’t see the applied threshold at the instance level. As we can’t give them access to the global folder.

Userlevel 1
Badge +2

That sounds like an RBAC related quirk for sure.  
I think your best course of action for that would be to submit feedback to the product team through your portal.  Im sure this is something that other MSPs have encountered as well, since most would have a similar customer folder structure. 

Userlevel 7
Badge +17

Ok we did some testing and the only issue we can see is the client can’t see the applied threshold at the instance level. As we can’t give them access to the global folder.

If they can see the alert tuning tab on the instance, they should be able to see the effective threshold.