Forum Discussion

Matt_Whitney's avatar
2 months ago

SNMP Traps - Baseline for alerting

Currently with SNMP Traps, each trap-based alert condition needs to be manually defined either via Eventsource or via LM Logs alert conditions.

It would make traps much easier to work with if LogicMonitor would give us a baseline of what traps should be alerted on based on vendor best practices. Similar to how LM already does for Datasources, trap-based alerting should be handled the same way.

Traps are a fundamental part of network monitoring, and I would love to see LogicMonitor put more of a focus on improving fundamental areas like this to improve basic usability. 

  • I couldn't find it now, but there used to be a blog post I would link to all the time about where LogicMonitor basically said, traps are bad don't use them. I imagine that is why there isn't a baseline. While SNMP has been "on a decline" for years and should eventually be replaced, and slowly is via API based monitoring, traps do need to die. The closest I found was a LM person posting on SpiceWorks.

    Traps are generally 1 time fires with no checking on their resolution. If you miss the trap, it for some reason didn't make it, you do not have an alert.

    Also the statement of "traps are a fundamental part of network monitoring" is a falsehood. If you are keeping up on refresh cycles and firmware upgrades, almost all enterprise level network gear these days would either be full on SNMP based polling, or API based polling. Now I am not saying a trap isn't still needed here or there. We encounter this from time to time, but its a very limited basis and we can generally build something better. Like a stateful ssh monitor that logs in, runs a command and returns an output that we can use for a datapoint.

  • I couldn't find it now, but there used to be a blog post I would link to all the time about where LogicMonitor basically said, traps are bad don't use them. I imagine that is why there isn't a baseline. While SNMP has been "on a decline" for years and should eventually be replaced, and slowly is via API based monitoring, traps do need to die. The closest I found was a LM person posting on SpiceWorks.

    Traps are generally 1 time fires with no checking on their resolution. If you miss the trap, it for some reason didn't make it, you do not have an alert.

    Also the statement of "traps are a fundamental part of network monitoring" is a falsehood. If you are keeping up on refresh cycles and firmware upgrades, almost all enterprise level network gear these days would either be full on SNMP based polling, or API based polling. Now I am not saying a trap isn't still needed here or there. We encounter this from time to time, but its a very limited basis and we can generally build something better. Like a stateful ssh monitor that logs in, runs a command and returns an output that we can use for a datapoint.

    • Matt_Whitney's avatar
      Matt_Whitney
      Icon for Advisor rankAdvisor

      Interesting, our network team uses traps extensively in their legacy monitoring tool.

      So would best practice generally be to use either SNMP or API-based polling where possible, and to use either a custom SSH-based datasource or traps/syslog if there are any conditions that can't be monitored via polling?

      Another question, do you feel that syslogs are in the same category as traps where they should be moved away from? Or should we lean towards using syslogs instead of traps in situations where polling-based monitoring won't work?

      • Joe_Williams's avatar
        Joe_Williams
        Icon for Professor rankProfessor

        Best practice is 100% dependent on your network of course.
        But, yes, my professional opinion, that I openly give to our clients is Trap based alerting is not best practice.

        But I would not suggest replacing all Trap based monitoring with stateful SSH-based datasources either. That would require a lot of customization in the platform and SSH-based datasources can weigh heavy on a collector.

        What should most likely happen in your environment is a full review of your datasources, make sure they are up to date, and a venn diagram of what it is the network team wants to alert on, what is covered by default in the platform, and then what could be utilized to cover something. Then if needed use traps.

        But also a new perspective on the network side, trap based alerting doesn't allow for metrics. Or easily customization of an alert. If they want to trigger on say a high temp alert in a router, they have to adjust the setting across all of their routers to set the high state alert. If you have proper devops, this isn't to hard of a change, but its a change in the operational environment, vs making a threshold change in LogicMonitor itself. That change doesn't have a chance of impacting operations of the device.

        Personally, I also treat syslog like traps. For the purposes of network monitoring, I avoid it, and highly suggest everyone does. For the same exact reasons as traps. Syslog tho still has its place in SIEM/SOC related items, Voice and some others. But for pure network monitoring, generally you can get everything you want via polling.