Forum Discussion

Gary_Dewrell's avatar
11 years ago

Sum across DataSources and dynamic alert levels

I would like to monitor our VMWare environment to determine if memory and/or cpu assignments would exceed our capacity to continue to run all VMs if we lost an ESX Host server.

 

Something like:
We have two dynamic groups created:
ESXHost = system.version =~ "VMWare"
ESXVMs = system.model == "VMware Virtual Platform"

Datapoints needed:
ESXHOSTCPU: SUM CPU MHZ for all ESX Servers
ESXHOSTMEM: SUM MEMORY installed for all ESX Servers.
ESXVMCPU: SUM CPU MHZ assigned for all VMs.
ESXVMMEM: SUM MEMORY assigned for all VMs.

Calculations:
(ESXVMCPU/ESXHOSTCPU)*100
(ESXVMMEM/ESXHOSTMEM)*100

Alert Conditions

Dynamic alert levels would be great.

Dynamic Alert level would be 100 – (100/(Total Members of group ESXHost))

In my case I have 3 ESX HOST so the alert level would be something like > 66

If the sum of memory or CPU MHZ assigned to all VMs exceeded 66% of the combined CPU and MEMORY available in all ESX servers then we would not be able to keep all VMs running if we lost an ESX Host.

  • Yes, this would be handy, what's the point if making your infrastructure redundant if the available resources aren't there when you need them. Great idea!

  • Anonymous's avatar
    Anonymous

    I would also love to see this. I would really like to be able to put a capacity management dashboard together and need to be able to perform sum calculations across multiple devices/datasources.

     

  • I would love to see this as well.  Actually I have a similar issue.  I would love to be able to do this across different devices.  For example, we have 2 PDU's in a rack.  They're completely seperate devices, but I need to know the aggregate power, and in turn have an aggregate threshold so I know if I've over subscribed the aggregate power.

  • likely you could achieve this with groovy and the java api, do a query and do your calcs on the vcenter box instead of doing it on the LM side. 

  • 6 minutes ago, Tom Lasswell said:

    likely you could achieve this with groovy and the java api, do a query and do your calcs on the vcenter box instead of doing it on the LM side. 

     

    This is true, I would actually say it should be simple enough to figure out at a cluster level.  That said though, this functionality would still be useful for cases like mine.  You won't always have a central console to do these calculations.  Short of having a really great naming scheme and some crazy scripting, you currently don't have a good way to aggregate resources.  Another prime example would be a web cluster.  If i have 8 server in a cluster, it would be nice to know that if I lose 2/6 web servers, I'm still ok CPU / memory wise (in theory at least).

  • 41 minutes ago, Eric Singer said:

     

    This is true, I would actually say it should be simple enough to figure out at a cluster level.  That said though, this functionality would still be useful for cases like mine.  You won't always have a central console to do these calculations.  Short of having a really great naming scheme and some crazy scripting, you currently don't have a good way to aggregate resources.  Another prime example would be a web cluster.  If i have 8 server in a cluster, it would be nice to know that if I lose 2/6 web servers, I'm still ok CPU / memory wise (in theory at least).

     

    i agree, scripting is always an option by using the RPC calls for GetData from LM and calculate that way, though, like you said "some crazy scripting". :)/emoticons/smile@2x.png 2x" title=":)" width="20">

  • Sarah_Terry's avatar
    Sarah_Terry
    Icon for Product Manager rankProduct Manager

    We're working on a feature that will enable much more flexible alerting across DataSources and Devices, and should address this request. At a high level, we'll allow you to group together instances across devices and set aggregate thresholds for the group - one of the primary use cases is better alerting for clustered resources.