Wireless Access

Reply
Highlighted
Contributor I

APs not connecting to primary LMS after enabling HA Fast Failover

We have a master / local environment and we recently enabled HA Fast Failover on the two locals (they are in separate data centers).  The feature seems to work fairly well with fail over times of around 20 seconds before service is fully restored.  One thing that we've noticed though is that some APs appear to be "stuck" on their standby controller instead of connecting back to their primary controller.  In one example, half the APs on one floor are on their primary and half are on their standby.  I know that we can move the APs but we have a couple of thousand APs and likely around 50-100 of them that would need to be manually moved.  Is this expected behaviour or are we encountering some kind of bug?  We also upgraded to 6.5.4.16 as part of enabling Fast Failover.

 

ET-02F-AP01 <ap-group> 325 10.200.220.234 Up 11h:35m:45s 2 10.204.65.27 10.204.1.27
ET-02F-AP02 <ap-group> 325 10.200.220.89 Up 11h:40m:5s 2 10.204.65.27 10.204.1.27
ET-02F-AP03 <ap-group> 325 10.200.220.66 Up 11h:42m:15s 2S 10.204.1.27 10.204.65.27
ET-02F-AP04 <ap-group> 325 10.200.220.210 Up 11h:43m:18s 2S 10.204.1.27 10.204.65.27
ET-02F-AP05 <ap-group> 325 10.200.220.247 Up 11h:42m:13s 2S 10.204.1.27 10.204.65.27
ET-02F-AP06 <ap-group> 325 10.200.221.210 Up 11h:42m:57s 2S 10.204.1.27 10.204.65.27
ET-02F-AP07 <ap-group> 325 10.200.221.50 Up 11h:36m:43s 2 10.204.65.27 10.204.1.27
ET-02F-AP08 <ap-group> 325 10.200.221.215 Up 11h:42m:13s 2S 10.204.1.27 10.204.65.27
ET-02F-AP09 <ap-group> 325 10.200.221.140 Up 11h:40m:22s 2 10.204.65.27 10.204.1.27
ET-02F-AP10 <ap-group> 325 10.200.221.75 Up 11h:36m:34s 2 10.204.65.27 10.204.1.27
ET-02F-AP11 <ap-group> 325 10.200.221.240 Up 11h:35m:3s 2 10.204.65.27 10.204.1.27

 

In the example above, all of these APs should be on the 10.204.65.27 controller as their primary yet five of them are using the standby as their primary.

 

Hoping someone has an idea on how to resolve this.

 

Thanks.

 

Highlighted
MVP Expert

Re: APs not connecting to primary LMS after enabling HA Fast Failover

Not an expected behavior but do you have other APs in that same location that do not behave this way ?

What type of APs?
What’s the return traffic between the standby controller and that location ?


Thank you

Victor Fabian

Pardon typos sent from Mobile
Thank you

Victor Fabian
Lead Mobility Architect @WEI
AMFX | ACMX | ACDX | ACCX | CWAP | CWDP | CWNA
Highlighted
Contributor I

Re: APs not connecting to primary LMS after enabling HA Fast Failover

Yes, as per the output above, about half are pointing to their correct/primary LMS while the others are pointing to their standby. 

 

These are all AP-325s and the two controllers (primary/standby) are in their own data center with about 15ms of latency between them.  Controllers are 7240XMs.

Highlighted
Contributor I

Re: APs not connecting to primary LMS after enabling HA Fast Failover

I've opened up a TAC case to see if they could identify anything and so far they've come back to state that having "ha-on-bkup-lms" enabled on the HA profile configuration, could be the reason for the APs not swinging back to their primary LMS.  Based on the description of the command, it seems plausible.  I'm going to try testing this out in my prod environment and report back.

 

HA on Backup-LMS
Starting from AOS-W 6.4.4.15, a new parameter, ha-on-bkup-lms, is added in the HA profile to enable or disable the HA on Backup-LMS.  When this parameter is enabled, an AP can set up a standby tunnel after the AP rebootstraps to Backup-LMS. However, in this case, LMS preemption will be ignored. When this parameter is disabled, the AP cannot set up a standby tunnel after the AP rebootstraps to Backup-LMS; the AP will rebootstrap to LMS if LMS is back and LMS preemption is enabled.

Search Airheads
cancel
Showing results for 
Search instead for 
Did you mean: