Wireless Access

last person joined: yesterday 

Access network design for branch, remote, outdoor, and campus locations with HPE Aruba Networking access points and mobility controllers.
Expand all | Collapse all

RAP Failover time

This thread has been viewed 8 times
  • 1.  RAP Failover time

    Posted Sep 11, 2017 10:22 AM

    Hi,

     

    I'm looking for the best (fastest) controller redundancy for RAP using AOS 6.5.x. I'm currently using a controller pair in a master, master-standby setup where the RAP is connecting to the VRRP address shared among the controllers. The redundancy is working which means that the RAP is connecting to the second controller when I reboot the primary. However, I'm a little disappointed with the failover time. When I have tested a similar setup with campus AP's connecting to the VRRP address I remember a client disruptive time of about 10sec. In my current controller setup terminating RAP's I see a client disruptive time of almost 2 minutes. My initial thought was that I had something misconfigured, but when searching for failover times regarding RAP's I haven't really found any numbers to point out what to expect. I do however find that the RAP minimum bootstrap threshold is 30. I believe this means that the RAP will not reconnect to a new controller before 30seconds has passed? This will make the RAP failover time a lot longer than for campus AP's? Does anybody have some experience with this?


    #6.5


  • 2.  RE: RAP Failover time

    EMPLOYEE
    Posted Sep 12, 2017 04:09 AM

    You should not use a Master/Master-Standby for this, as the standby master will not terminate any APs until it becomes active. What probably is happening is that during a failover it is taking some time for the standby master to become active, and only after that, it will terminate APs. This can take some time.

     

    Terminating APs on a master is not recommended unless you have an all master deployment. Standby Master will not terminate APs.

     

    If you have locals, terminate the RAP on the locals, if you just have two masters it may make sense to make them both (active) master and use Airwave to sync configuration as in the all-masters model.

     

    With ArubaOS 8.1, you can run remote APs with a redundant tunnel to dual controllers to have real fast failover times.



  • 3.  RE: RAP Failover time

    Posted Sep 26, 2017 05:04 AM

    I have now tried failover with a master/local setup. Unfortunately I don't see any significant difference in the failover time. The client time of about 2 min is the same. However when digging deeper into this I see that the RAP is ready in about 1 min. The remaining minute is due to client "slowness". Again I'm suspecting the "30 seconds" bootstrap threshold and probably a longer RAP connecting time vs CAP for making the failover time for RAP about 1min.

     

    I think I will look into AOS 8.1 and see what numbers I will get with a cluster setup.



  • 4.  RE: RAP Failover time

    Posted Oct 18, 2017 05:51 AM

    It seems like RAP and failover times is not the hottest topic :) Anyway I have now tested AOS 8.1 with MCs in a cluster setup. I can confirm that failover in this version works much better. Two tunnels are formed to the RAP and if the primary fails it will instantly move over to the secondary tunnel. I have measured a client disruptive time of about 3 seconds.

     

    Again if anyone has timed a controller failover with RAP on AOS 6.x I would be interested in those numbers.



  • 5.  RE: RAP Failover time

    EMPLOYEE
    Posted Mar 14, 2018 06:33 PM

    Well, this is a longer topic than you would expect. You need to understand few things about this failover scenario, main one -
    It takes longer for the RAP to discover failure of the main LMS than i CAP would. 

    CAP needs to miss 8 heartbeats (8 seconds)  
    RAP - 30.

    This is intentional, as WAN connections are not as stable as LAN, so we need to allow a longer threshold.
    So it takes at least 30 seconds to discover that your controller is down.

    Shut down radios.

    Re-establish IPsec  to the backup LMS.

    Re-IP the client ! this is also important, as it takes time. 
    So we are looking at 30-90 second downtime for the client and full re-connect and re-auth. 
    Home I've shed some light on this issue for you.
    Also, as mentioned below, going 8.x makes failover if not a "thing of the past" but a much neater and cleaner process. (and much faster one)