Wireless Access

Reply

APs failed over to backup LMS

The majority of our campus APs failed over to our backup LMS IP (master controller) and I'm having a difficult time understanding how this happened.  It resulted in a large wireless outage which is why I'm trying to find the root cause.  I see the following message logged for APs that failed over:

 

Rebootstrap Information

-----------------------

Date       Time     Reason (Latest 10)

--------------------------------------

2013-03-08 10:22:47 Switching to LMS 10.X.X.9. Send failed in function sapd_check_hbt.  Last Ctrl message: BW_REPORT len=150 dest=10.X.X.10 tries=1 seq=14549

 

TAC said this message indicates the AP heartbeats to the controller were missed, resulting in a failover.  I can't find any indication that we had network problems, either in our core infrastructure or with the controller, or links flapping.  All systems have been up, no topology changes, no interface errors.  I don't see how the heartbeats could've been missed after confirming all this.

 

Anyone have thoughts on this?

=======================================
If a reply adequately addresses your issue, please click on the "Accept as Solution" and "Give Kudos" button so this information can benefit other users.

Re: APs failed over to backup LMS

What do you see in the controller logs? Generally there is missed hear beat messages in there too. Do you have any Heartbeat DSCP configured on the AP? You can then prioritize AP heartbeats if the link becomes saturated.

ACMA, ACMP
If my post addresses your query, give kudos:)

Re: APs failed over to backup LMS

Is there a specific log file or log command that would you would use to look for heartbeat issues?

No, I haven't configured any heartbeat DSCP. I know that was configurable.
=======================================
If a reply adequately addresses your issue, please click on the "Accept as Solution" and "Give Kudos" button so this information can benefit other users.
Guru Elite

Re: APs failed over to backup LMS


thecompnerd wrote:
Is there a specific log file or log command that would you would use to look for heartbeat issues?

No, I haven't configured any heartbeat DSCP. I know that was configurable.

Very few people use Heartbeat DSCP.  Unless your wired utilization is over a certain percentage sustaned, it does not come into play.

 

If you type show ap debug counters, it will tell you what devices have more bootstraps than others.  Those are the ones you should look at the connectivity to.

 



Colin Joseph
Aruba Customer Engineering

Looking for an Answer? Search the Community Knowledge Base Here: Community Knowledge Base

Occasional Contributor I

Re: APs failed over to backup LMS

We've had the same occurrences recently.  No change in the network topology and no apparent network issues - yet hundreds of APs bootstrapped (and regardless of how saturated their links were). We opened a case and the end result was to change the heartbeat DSCP value.  We've not had any issues in the two weeks since.

 

One thing I'm curious about is why access points that were not missing their heartbeats also boostrapped along with all the others.  Did you see similar behavior?

Occasional Contributor I

Re: APs failed over to backup LMS

And in doing some additional checking of the AP debug logs, the access points without missing heartbeats did bootstrap due to missed heartbeats.  They haven't missed any since their last known reboot, and so my misplaced confusion...

 

Changing the DSCP value seems to be the way to go.

Frequent Contributor II

Re: APs failed over to backup LMS

What did you change the dscp value to?

Occasional Contributor I

Re: APs failed over to backup LMS

We changed the value to 46, per the support engineer's guidance.

Re: APs failed over to backup LMS

 

 

Do this happened at different times of the day or around the same time ?

 

Also check if other APs are experiencing this issue and you can check if APs are bootstrapping/rebooting by running the show ap debug counters 

 

(controller) #show ap debug counters

AP Counters
-----------
Name Group IP Address Configs Sent Configs Acked AP Boots Sent AP Boots Acked Bootstraps (Total) Reboots Crash

If it is only certain APs then check for layer 1 issues , wire to the AP or the port or the trunk to that switch.

 

 

 

                                      

 

Thank you

Victor Fabian
Lead Mobility Engineer @ Integration Partners
AMFX | ACMX | ACDX | ACCX | CWAP | CWDP | CWNA

Re: APs failed over to backup LMS

Very good info!  Thanks for the reply.

=======================================
If a reply adequately addresses your issue, please click on the "Accept as Solution" and "Give Kudos" button so this information can benefit other users.
Search Airheads
cancel
Showing results for 
Search instead for 
Did you mean: