Forum Discussion

starboy9's avatar
6 years ago

Multiple Collectors

Hello,

I was wondering in what the scenario would you need more than one collector deployed in an environment.

Also would you suggest a virtual appliance that sits in the datacenter or a physical appliance that monitors externally?

Thanks in Advance

  • There are several reasons why you would want multiple collectors in an environment:

    • High Availability: If a collector system fails you can have another one take it over (LM supports active-active). Also useful for collector upgrades without downtime
    • Load: Depending on how many items (not just devices) your monitoring, you may need many collectors to handle the load.
    • Site: Some site-to-site VPN are not stable enough for monitoring remotely over and there can be a major side-effect mentioned below.
    • Network Segmentation: The collector and it's failover collectors needs to be able to directly communicate with each resource being monitored, so network segmentation may require multiple collectors. (Guess that also depends on your definition of "environment").

    I don't think it really matter if you setup a virtual machine or use physical boxes. I think that would depend on your infrastructure. Most of ours are virtual without problems.

    A big possible problem to keep in mind about remote monitoring is due to LM's current lack of full dependencies. If a site that goes down that is being monitored by a remote collector will cause an alert for every resource at that site. Without the collector also going down, it will not cause a Collector Down condition and will not prevent the those alerts from occurring. So I avoid doing remote monitoring (on the Resource tab) whenever possible personally.

     

  • @Mike Moniz

    Awesome Thank You for that explanation!

    I have a few follow ups...

    1. As far as "Load" when looking at the sizing chart i am not seeing anything that says if you have "X" number of machines you need this many collectors.  Is there anything that you do when looking at a new environment to determine the appropriate size/amount of collectors?

    2.  When splitting from Site-to-Site or segmented networks - would you just create a collector group and then add the discovered devices specific to that site/segment to be monitored by that particular collector?

     

  • Yeah, it depends far more on what you are monitoring per device then the number of devices you have. Monitoring all the shares on a windows file server via WMI will put more load then doing SNMP on a switch.

    I personally don't have a lot of experience in balancing collectors myself and Auto-Balanced Collector Groups (ABCG) are very new and I haven't played with them. I would likely setting up 1 (or 2 for failover) in an auto-balanced group per site and then grow with more collectors as needed. But as I haven't used ABCG perhaps others on the forums can make better suggestions or you can open a chat with LM to look at your particular environment, especially about choosing small/med/large.

  • We've been working with the ABCGs and have found some of their foibles.  Specifically, when they rebalance, they only consider the instance count, so the device counts may be heavily skewed.  I'm working on a rebalancer that does a better job splitting the load.  dataSources that use batchscripts to collect are also VERY heavy handed on the collector they run from.  Keeping an eye on # of batchscripts specifically can help show you if your environment is truly balanced.  Ours ended up filtering all of the hyper-V hosts to one collector and all of the VMs to the other through rebalanacing... so when a DS fires on one VM, it fires on all of them and since they're all on a single collector, there is no load balancing done.  Keep an eye on which resources end up on which collector... you may end up having to increase the balance threshold to prevent it from balancing, then manually move a set of them before lowering the threshold back down to maintain.  I go through and force a rebalance about once a week.

    Ideally, the ABCG would sort devices by number of instances reporting to each, then tack each next device and hand it to the next collector in the group in a round robin fashion to get true balance of the load.  This is the basis of the rebalancer I'm working on.  ABCG has been the majority of my time over the past few weeks trying to prevent resource exhaustion on the collectors.  We're a very heavy monitoring shop and we're finding the limits of LM.

    I have another thread I posted on that has a calculator for getting your threshold to actually balance your instances.  I'll have to dig that up.

  • I've also setup a dashboard graph widget to monitor the balance of each of my groups: