Forum Discussion

Kirby_Timm's avatar
2 years ago

VPN Tunnel Monitoring

We have several Cisco IPSec Aggregate Tunnels that we are monitoring on our ASA.  The problem is, many of them have a 30 minute idle timeout.  I don't really need (or want) an alert if a VPN tunnel drops because it's idle.  Ideally, I want an alert if there WAS data going though the tunnel and then it dropped.  I've played with a few different alert settings but I've not had a whole lot of luck in getting good alerting.  I saw on an old post where someone wrote a script to ping the other side of the VPN tunnel but I really don't want to artificially inflate my VPN uptime with traffic.  I'm wondering what others have done?   As I said, ideally I'd love to have some logic in my alerts that would be like if vpn_tunnel outbound throughput or inbound throughput has been greater than 0 in the past 5 minutes & the VPN drops, then alert me.  Anyway, what have you done for alerting and has it worked well for you? 

10 Replies

  • So, you're using this to monitor or at least tune alerts that are on a secondary device?

  • We have a workaround for this. Let me look it up. Delayed since my PTO has already started. 

  • I'd love to see what you've got Stuart!  Thanks in advance.

    We had another one today.  VPN tunnel was idle timeout closed and was down for about 14 hours.  I give a ping through the VPN tunnel and it comes right up, so there wasn't any "problem" with the tunnel.  I can see how trying to monitor VPN tunnels can be super tricky because how are you going to determine if the tunnel is down because of an issue or down because of idle timeout.

  • On 12/20/2022 at 6:17 PM, Stuart Weenig said:

    We have a workaround for this. Let me look it up. Delayed since my PTO has already started. 

    Hey Stuart,

    I'm back in the office now myself.  If you've not forgotten about me, once you get caught up on everything that needs to be taken care of after being gone on PTO, I'd really like to see the work around you came up with!

    Thanks,

    Kirby

  • On 12/21/2022 at 8:14 AM, Kirby Timm said:

    I'd love to see what you've got Stuart!  Thanks in advance.

    We had another one today.  VPN tunnel was idle timeout closed and was down for about 14 hours.  I give a ping through the VPN tunnel and it comes right up, so there wasn't any "problem" with the tunnel.  I can see how trying to monitor VPN tunnels can be super tricky because how are you going to determine if the tunnel is down because of an issue or down because of idle timeout.

    We have resorted to using logs for this -- the reason for the tunnel being down is only presented in the log entries, and not in any OIDs, unfortunately. 

    Very curious to see what LogicMonitor has used as a solution for this as well.

  • I'm back. Let me look at what we did so I can explain it coherently. For clarity, this is not an LM solution. I'm not with LM (anymore). 

  • Ok, several changes to get this to work. First to the collection script:

    We added the following lines to the collection script. Ours on the left, repo version on the right. It looks in 1.3.6.1.4.1.9.9.147.1.2.1.1.1.2.7 to see if this is a secondary unit. In our case, this is why the tunnels are "down", because they are on a secondary unit. If this isn't the case for you, you'll have to find a different way of differentiating between them.

    Then we added a datapoint to contain that isStandby output:

    Then we modified the TunnelActiveTime_Seconds datapoint (ours on left, repo version on right):

    The end result is that 1000000000 is added to the TunnelActiveTime_Seconds if the unit is a standby unit. This means that the uptime of the tunnel looks like 31 years. We understand that the tunnel hasn't been up for 31 years, besides we only pay attention to the standby unit when it alerts, which it doesn't now because the uptime is nice and high.

  • @Stuart Weenig In our case (and likely in Kirby's case, as evidenced by his statement that he sends a ping along the tunnel), our tunnels are going idle, due to no traffic traversing them. There is a way to configure keepalives to keep the tunnels active, but that takes some configuration, and I'm not 100% it's always supported by the remote endpoints?

    It's always possible there is an OID that I've simply never been able to find that reports this 'Down reason,' but I'm going to guess there's a decent chance that your circumstances (a backup device) might be different (though, still helpful!).

  • 20 minutes ago, Austin Culbertson said:

    @Stuart Weenig In our case (and likely in Kirby's case, as evidenced by his statement that he sends a ping along the tunnel), our tunnels are going idle, due to no traffic traversing them. There is a way to configure keepalives to keep the tunnels active, but that takes some configuration, and I'm not 100% it's always supported by the remote endpoints?

    It's always possible there is an OID that I've simply never been able to find that reports this 'Down reason,' but I'm going to guess there's a decent chance that your circumstances (a backup device) might be different (though, still helpful!).

    Yes, you're correct.  The tunnel is going "down" because of an idle timeout, which in my opinion, shouldn't warrant an alarm in LM.  I could change the timeouts on the tunnels in the ASA but I don't really see a good reason too.  IMHO if there is no traffic going through the tunnel than it should shutdown until it's needed again.  I just don't need an alarm telling me the tunnel shutdown because of an idle timeout.  I don't think there is any OID that gives LM that info though and I'm not sure how one could do it programmatically either.

  • 27 minutes ago, Kirby Timm said:

    Yes, you're correct.  The tunnel is going "down" because of an idle timeout, which in my opinion, shouldn't warrant an alarm in LM.  I could change the timeouts on the tunnels in the ASA but I don't really see a good reason too.  IMHO if there is no traffic going through the tunnel than it should shutdown until it's needed again.  I just don't need an alarm telling me the tunnel shutdown because of an idle timeout.  I don't think there is any OID that gives LM that info though and I'm not sure how one could do it programmatically either.

    Yep -- In my investigation long, long ago, I came to the same conclusion -- which is why we resorted to utilizing the logs for the device, as the 'Tunnel Down' reason is not available via any OIDs, as best I could tell.