Forum Discussion

Kelemvor's avatar
Kelemvor
Icon for Professor rankProfessor
9 days ago

Are Auto-Balance groups supposed to balance automatically?

Hi,

I have a collector that's been alerting due to low memory.  It's a Large collector with 8 Gigs of RAM based on the recommendations here: https://www.logicmonitor.com/support/collector-capacity

This collector is in a Auto Balance group with another that is not having any memory problems.  One is running 40000+ instances and the other has only 23000+.  One has 450+ resources and the other has 50.  This doesn't sound like they are balanced to me.

Is there something I need to do so LM will actually keep these two collectors balanced so they have approximately the same number of things they are monitoring?  The Rebalance Threshold  - Instance Count items is set to 30,000.  Since one collector is way over that, shouldn't that trigger a rebalance to occur?

What am I missing?

4 Replies

  • Use this formula:

    Base Rebalancing Threshold = ((Total group instances*(1+buffer))/number of collectors in the ABCG)/memory scaling

    "buffer" is a percentage amount expressed as a decimal part of 1(one), i.e. 50% would be 0.5, 15% would be 0.15.

    This buffer is not mandatory, it can be zero. Although the best practice is to have some room for when a collector within the group fails. Collectors are very resilient and can work overloaded, of course they will start to insert gaps in the data collection, but they will do their best to keep up.

    "memory scaling" would be the amount of memory used by the JVM according to the collectors size. Remember that within an ABCG, all collectors must be the same in all aspects, memory, OS, version, etcetera.

    The resulting number "Base Rebalancing Threshold" is what you need to enter in the text box when you are managing the ABCG group "Rebalance Threshold - Instance Count". The click in the "Schedule Rebalance" button and then "Save". That would trigger the rebalance.

    Few considerations:

    1. Resources with preferred collector designated won't be rebalanced. Change the preferred collector within the resource management to the ABCG group desired.
    2. Collectors must be preferably within the same subnet.
    3. All collectors must be equal (size, OS, version, hardware)
    4. The "Base Rebalancing Threshold" doesn't represent the amount of instances that will be assigned to each collector.
    5. The difference between instances monitored by each collector may differ, but at most, by hundreds of instances. If they differ by thousands, then balance is not being done or many resources have preferred collector and they aren't being rebalanced. Use an inventory report to find out preferred collector on each resource.

    https://www.logicmonitor.com/support/collector-capacityhttps://www.logicmonitor.com/support/auto-balanced-collector-groupshttps://www.logicmonitor.com/support/changing-the-preferred-collector-new-ui

    Regards👨‍💻

  • What is supposed to happen if I put in a different number?  I currently have a group setup with:

    • 1 - Large - 40229 instances - 471 resources - 45 websites
    • 2 - Large - 23504 instances - 51 resources - 70 websites

    The rebalance threshold in the group settings is 30,000.  Both servers have 8 Gigs of RAM as suggested by the LM Link.

    #1 is currently using 92% memory and is generating alerts.  #2 is only using 65% memory and is working just fine.

    Since #1 has 40229 instances, and the rebalance threshold is set to 30000, shouldn't that cause a rebalance?  I even went and clicked the Rebalance button and nothing happened.

    This just seems like an idiotic implementation of Auto Balancing groups.  There should be a global setting that says If one collector in an ABCG gets more than 10% higher than the other, rebalance to bring them more inline.  I shouldn't have to manually be putting in numbers based on math calculations.  It shouldn't be this hard.  ;)

    • befuddled's avatar
      befuddled
      Icon for Neophyte rankNeophyte

      I'm not an LM employee, just a user with a little experience. 

      Based on your numbers and the spreadsheet, your rebalance threshold is 22533 (Note: the memory figure is for JVM not total memory so it would be 4 for a large).

      Would it be possible for you to try that and do a manual rebalance? You shouldn't need to perform the manual rebalance (just wait several minutes), but it shouldn't cause any issues. 

      I've spoken with their support several times, and the numbers they provide are typically the same as mine. I don't claim to understand it :)

      I agree it shouldn't be this hard. The environment I manage has about 150 collectors, and I wrote a Lambda function that adjusts the rebalance threshold periodically.

      I'm sure others probably have a better approach. 

      I agree with you, it shouldn't be this hard. 

      Best of luck!

       

  • The rebalance threshold doesn't seem to work the way some people think it does. I'll tell you what works for me. (most the below is just background info). Actual calculation is at the bottom.

    From this page:
    https://www.logicmonitor.com/support/auto-balanced-collector-groups
    Using this formula to determine the new rebalance threshold:

    "The number of instances a Collector can handle is determined using the following formula:

    Number of instances = (Target_Collector_mem/Medium_mem)^1/2 * Medium_Threshold

    For example, if a user sets a threshold for a medium-size (2 GB) Collector to 10,000, for a large-size (4 GB) Collector, the threshold will be scaled to:

    14140 instances = (4/2)^1/2*10000"

    For the round numbers and two large collectors, the rebalance threshold I calculated is 22274 (note: use your actual numbers instead of the round numbers). I use the above formula, but use all instances across all collectors:

    List the number of instances per collector one row for each collector. This will be used to sum the total instances and sum the total number of collectors.Enter the target collector JVM Memory (e.g. Medium = 2, Large = 4, Extra Large = 8)2 GB (Do not change)10000 (Do not change)Output (this will be the rebalance threshold)
    InstancesTarget Collector MemoryMedium MemoryMedium ThresholdNew Rebalance Threshold
    40000421000022274
    23000    

    Above is a portion of a spreadsheet I have. It's pretty straightforward. The New Rebalance Threshold field formula should be

    =(SUM($A:$A)/(($B$3/$C$3)^0.5))/COUNT($A:$A)

    Where column A is each collector instance currently, column B is based on the current size of the collectors, C and D are static, and E should have the formula above. 

    I probably over-complicated this, but it works for me.

    If you try this, let me know if it works for you. I haven't revisited that spreadsheet in a while. 

    ETA: After posting this, I found this thread which might provide more info (and a couple of different ways to calculate)
    https://community.logicmonitor.com/discussions/feature-requests/lm-instance-auto-balance-collector-tool/17389