Forum Discussion

eandrewes's avatar
2 years ago

Question: Autobalanced Collectors and Trap/Syslog reception

Hi everyone, can someone please advise the correct architecture/approach that should be used in the event that LogicMonitor collectors are configured in an AutoBalancedCollectorGroup, and are expected to receive push (SNMP Traps/syslogs) information from managed NEs in addition to SNMP polling. If the NEs are configured to send traps to all collectors (for redundancy) how is de-duplication of traps managed if the same trap appears on multiple collectors? Does each collector handle an instance of the trap or does trap reception need to be excluded from auto balance by setting a preferred collector for traps, or is LM able to deduplicate the traps itself?

 

5 Replies

  • For traps and traditional syslog monitoring, those incoming message are dropped by the collector if the source device isn't currently assigned to that collector. 

  • What we ended up doing for syslog.. with a ABCG is keepalived on all the nodes 1 master rest backup.. the samplicator on each node which would send everywhere but the only node that would receive traffic would be the node with the MASTER vrrp. This way its a single endpoint on your devices and they will shuffle the data as needed.

  • In my opinion, that is a terrible limitation as it works against the concept of using ABCGs. The only solution is to spam traps to all of the individual collectors in a group, hoping you hit the one that's polling your device. If you ever have to extend your ABCG, now you have to touch all of your infrastructure to add a new trap destination as well.. 

    It shouldn't be difficult to have all Collectors ingest traps and assign them to the proper resource/collector via some back-end mechanism... When I've brought this up to folks at LM, they recommend not using traps in the first place, polling for everything. That's a nice idea, but not exactly feasible to accomplish day 1 when you're moving your whole organization into this product (if it's even truly possible to get away from traps completely).

  • 5 hours ago, Tisch said:

    In my opinion, that is a terrible limitation as it works against the concept of using ABCGs.

    Agreed.

    5 hours ago, Tisch said:

    have all Collectors ingest traps and assign them to the proper resource/collector via some back-end mechanism

    This is how syslog works with LM Logs. It can be done, but LM is pretty anti-traps, so they might not do it for traps any time soon (even though they likely could). Have you thought about samplicator? It could be a single point that your infrastructure can send to, and you can configure it to forward to one/all/specific collectors.

    That said, I'll echo LM: what are you getting from traps that you aren't getting from polling? Seriously. If the answer is non-zero, in most cases, you can identify the values bound into the traps and poll them, eliminating the need for a trap. Not to mention EventSources don't have a concept of one trap to open the alert and a corresponding alert to close the trap.

  • On 7/16/2022 at 10:03 AM, Stuart Weenig said:

    Agreed.

    This is how syslog works with LM Logs. It can be done, but LM is pretty anti-traps, so they might not do it for traps any time soon (even though they likely could). Have you thought about samplicator? It could be a single point that your infrastructure can send to, and you can configure it to forward to one/all/specific collectors.

    That said, I'll echo LM: what are you getting from traps that you aren't getting from polling? Seriously. If the answer is non-zero, in most cases, you can identify the values bound into the traps and poll them, eliminating the need for a trap. Not to mention EventSources don't have a concept of one trap to open the alert and a corresponding alert to close the trap.

    Thanks Michael. Unfortunatly the issue is that we've hit a hard limit on how much we can poll off the end devices. We're polling Huawei optical kit that aggregate's residential connections and in addition to interface stats we are attempting to use the equipment supported dying gasp traps as these represent a customer impacting issue. However attempting the poll the 800000+ dying gasp state per device has proven completely unable to scale (both in collector tasks as well as load introduced on the managed element).