AAA, NAC, Guest Access & BYOD

 View Only
last person joined: one year ago 

Solutions for legacy and existing products and solutions, including Clearpass, CPPM, OnBoard, OnGuard, Guest, QuickConnect, AirGroup, and Introspect

IAP-VPN failover best practices config tweaks 

Mar 31, 2023 10:57 AM

Problem:

Failover outages and pitfalls on IAP VPN deployments

Problem Statement & Scenario

when failing the primary broadband, the SD-Branch tunnels go down and come up on the backup and the routing mostly works, pretty quickly.  But, the IAP-VPN tunnel stays up on the primary (as well as the new tunnel on the backup) VPNC for 5 or 6 minutes – and while it is up still on the primary, the VPNC continues to advertise a route to the L3 networks – these routes don’t work and we see an outage for 5-6 minutes.



Diagnostics:

•IAP-VPN tunnel stays up on the primary VPNC (as well as the new tunnel on the backup) for 5 or 6 minutes – and while it is up still on the primary, the VPNC continues to advertise a route to the L3 networks – these routes may not work, and we would expect an outage for 5-6 minutes.

•This illustrates 30x10 = 300 seconds for the IAP to detect the primary uplink is down and initiate switch to backup uplink. So the primary uplink tunnel shows as up on VPNC for about 5 minutes after the primary uplink went down - since IAP is the tunnel initiator, VPNC does not monitor the tunnel status and continues to send route advertisements during this period.

•failover-internet-pkt-lost-cnt 10   à This is the number of ICMP packets that are allowed to be lost to determine if AP must switch to a different uplink connection

•failover-internet-pkt-send-freq 30 à ICMP packets are sent once every 30 seconds

•IAP forms IPSec tunnel to the VPNC again, registers the branch and then the OSPF/BGP route advertisements will point to the new tunnel. This explains the outage for 5-6 minutes



Solution

Tips & Tricks for IAP failover best practices config tweaks

Failover outages and pitfalls on IAP VPN deployments

 

•IAP-VPN tunnel stays up on the primary VPNC (as well as the new tunnel on the backup) for 5 or 6 minutes – and while it is up still on the primary, the VPNC continues to advertise a route to the L3 networks – these routes may not work, and we would expect an outage for 5-6 minutes.

•This illustrates 30x10 = 300 seconds for the IAP to detect the primary uplink is down and initiate switch to backup uplink. So the primary uplink tunnel shows as up on VPNC for about 5 minutes after the primary uplink went down - since IAP is the tunnel initiator, VPNC does not monitor the tunnel status and continues to send route advertisements during this period.

•failover-internet-pkt-lost-cnt 10   à This is the number of ICMP packets that are allowed to be lost to determine if AP must switch to a different uplink connection

•failover-internet-pkt-send-freq 30 à ICMP packets are sent once every 30 seconds

•IAP forms IPSec tunnel to the VPNC again, registers the branch and then the OSPF/BGP route advertisements will point to the new tunnel. This explains the outage for 5-6 minutes

  Uplink & VLAN Default config UI

 

 

Root cause for example in this case of VRRP

–On the Gateway, default timeout for IPsec tunnel down for IAP tunnel is 5 min. After this timeout, tunnel is removed from ipsec crypto table, auth notifies IAP manager and IAP manager removes the datapath routes.

– VPNC1 is disconnected from WAN link (still connected to core switch) and VRRP failover happens to VPNC2.

–IAP is setting up tunnel to VPNC2 around 20-30 seconds after VRRP failover and registering with it but may taking 5 minutes for VPNC1 to remove the routes and stop advertising the routes upstream.

– VPNC2 should be advertising routes for the IAP during this time but the upstream routers may not update their route table until VPNC1 stops advertising the routes

 

user-idle-timeout” in “default-iap” VPN authentication profile. 

*[mynode] (config) #aaa authentication vpn default-iap

 

*[mynode] (VPN Authentication Profile "default-iap") #

user-idle-timeout       User idle timeout value. Valid range is 30-15300 seconds in multiples of 30 seconds

Config Tweaks Tips Recommended

 

IAP side

Default

–failover-internet-pkt-lost-cnt=10

–failover-internet-pkt-send-freq=30

–failover-internet-check-timeout=300 à This is ICMP packet timeout, default is 10 seconds

–Make sure preemption enabled on fail over.

Optimized settings

–failover-internet-pkt-lost-cnt=6  

–failover-internet-pkt-send-freq=5

–failover-internet-check-timeout=4 à This is ICMP packet timeout, default is 10 seconds

Controller side

*[mynode] (config) #aaa authentication vpn default-iap

*[mynode] (VPN Authentication Profile "default-iap") #

user-idle-timeout       User idle timeout value. Valid range is 30-15300 seconds in multiples of 30 seconds

Set to 30 secs 

 

From 10.x we got better way of handling failover as that would be hitless with cluster being built and configured on headend VPNC side.

Statistics
0 Favorited
8 Views
0 Files
0 Shares
0 Downloads

Related Entries and Links

No Related Resource entered.