Overall we break up our resources similar to what you described. One big difference is we generally only monitor Production-things with LogicMonitor. We are just getting to the maturity level where we now think about application-level observability and coordinate with teams as they promote things from dev to stage to prod, so we do have a few non-production things being monitored for our "big stuff."
I'll describe what we do, but I am going to split it into two areas: Resource Grouping and Alert Routing
Resource Grouping
The following is a generalization of our Resource Tree within LM:
- Devices by Team
-- Team A
-- Team B
-- Application 1
-- Application 2
-- etc
- Devices by Type
-- Type 1 (*)
-- Type 2 (*)
-- Type 3
-- etc
- Cloud Things
- Infrastructure
-- Region 1
--- Location 1
---- Network Devices
---- Compute Devices
--- Location 2
----Network Devices
-- Region 2
-- Non-Production
- Line of Business A
-- AMER
--- Region 1
---- Location A
---- Location B
--- Region 2
---- Location C
-- EMEA
--- Region 3
--- Region 4
-- APAC
--- Region 5
- Sandbox & Setup
We have some of the Devices by * folders that LM suggests for best practices and to work with their default dashboards. There are also a couple of Sandbox/Setup folders that catch new VMs or Network devices that appear in the portal and we haven't classified yet.
I'm assuming you know this, and just repeating for anyone newer to LM reading: Devices can (and should) be in lots of different Resource Groups. I love this feature of LM and how it can be used to control properties, routing of alerts, etc.
Our other top-level resource grouping generally matches some organizational setup (aka VPs and their teams...but not as specific as Devices by Team that might drill down to a couple of people and their manager), mixed with a bit of geography. These are what I think of as our business folders because they mirror various in-real-life people-organizations of our business.
For example, we have an Infrastructure group related to our back offices or warehouses are grouped together in one offshoot of folders under here. Because our Infrastructure teams (like network/systems) is a Tier 2 support for most applications, we have a top-level resource group just for them that organizes anything how they want. Like I mentioned - some of this is organization-based and some is geography-based.
Other top-level folders are based around certain lines of business. For example, retail stores. There's a breakdown of folders based on global-geography, regional-geography, location, and the devices within those locations.
Again - I'll mention the benefit of devices existing in more than one resource group because we have lots of tagging that happens at various levels in these folders. We have some properties being set in our Devices by Team and Devices by Type folders, but often these might be overridden by the same property being set in our Infrastructure or Line of Business resource group tree.
Because our business folders generally end up having a deeper tree than any of the Devices by * folders we rarely have property conflicts where we expected a property to be set and it is not. For example, all network devices get a servicenow.group property saying they belong to our Network Team (Tier 2 support). This is set on a folders within the Devices by Type resource group. But, one of those devices might be specific to a particular retail store, so it shows up again six-levels deep in one of our line of business folders and that servicenow.group property new says it belongs to our Retail Support Region 3 Team (Tier 3).
Another benefit with these business folders is using RBAC for the various teams that just want to see their stuff. There is a lot of granularity within the Role-configuration for your Resource Tree
Alert Routing
We are using ServiceNow to catch alerts coming out of LogicMonitor, and try and keep that path as simple as possible. We have one main Integration defined with ServiceNow, and one main Escalation Chain to get alerts to ServiceNow that covers 99% of our alerts.
As far as routing to different teams, we use that servicenow.group property I mentioned on all of our devices for which Assignment Group in ServiceNow the Incident should go to when created. We modified the JSON being sent for events between LM-SNow to include some extra data in values that aren't officially documented by LM - we figured out the were available by looking at the SNow code/trigger receiving these alerts. Also important to note is the servicenow.group property matches up with our Tier 2 & Tier 3 support teams because we try and have any automated alert skip over our Tier 1 Service Desk team for assignment.
This also means our Alert Rules are fairly simple - we have a catch-all to ignore Warnings, and a catch-all to route all Criticals to ServiceNow. Most Rules are dealing with which Errors we care about and if they should get ignored or sent on to ServiceNow. Thus, most of our rules have criteria based around our resource group structure - e.g. "Warnings for Palo Alto devices should go to SNow." One important thing with our Alerts is we've grouped the numbers based on teams (picture the Dewey Decimal system). So things with Priority in the 400s might be a Line of Business, rules with Priority in the 500s are Networking, etc. Anything under 100 us a super-special-override that might be documented "why" somewhere.
----
Overall, I would say our setup philosophy is this: if I need to accommodate a special case, business rule, etc then I want that weird thing to be represented in an obvious place and be well documented.
For example, we might make a special Resource Group folder under a Line of Business with a subset of devices within to get certain things to route properly. Or, we create a very special Alert Rule with Priority < 100.
I feel lucky that I can treat LM only as an O11y and Alerting tool, and we have ServiceNow to handle the response processes, and also that we are in a phase of standardizing incident response across the IT org. 😀 I want LM to watch what should be normal, and raise an alarm to a human when things are bad.