Wireless Access

last person joined: yesterday 

Access network design for branch, remote, outdoor, and campus locations with HPE Aruba Networking access points and mobility controllers.
Expand all | Collapse all

AP 105s and 65s keep failing back and forth when both 7210 Master/Local are up

This thread has been viewed 0 times
  • 1.  AP 105s and 65s keep failing back and forth when both 7210 Master/Local are up

    Posted Jul 12, 2013 09:25 AM

    Hello,

     

      I have a Master/Local 7210 Controller Setup.  I have 45 Access Points currently a mixture of AP 65 and AP 105 devices.  They will just be working and out of the blue fail from the Local to the Master and go into a D Flag and sit that way for 5 minutes or so and then once they are all up on the Master fail back and sit in a D Flag state and take about 5 minutes to clear up so it knocks the Wireless out for at least 10 minutes.  Some times it is fine for a few days and then others it will go back and forth.  Prior the APs were managed by a Single 2400 without issue.  Currently running 6.2.1.2 on both Controllers.  Not certain if this is something network wise impacting the controllers and causing lose of communication or something on the Controllers code wise or configuration wise causing it to recycle a service and causing it to fail over.  Opened a Support case but I never know when it will happen and sometimes I catch it on the tail end so they are unable to see the issue.  They stated the configuration is setup correctly.

     

       Thanks,

          Evan Cardanha


    #7210


  • 2.  RE: AP 105s and 65s keep failing back and forth when both 7210 Master/Local are up

    Posted Jul 12, 2013 09:33 AM
    What's your redundancy configured ? LMS or VRRP ?

    Make sure your master controller has the same VLANs as your local .

    You may be having some networks issues on your local controller :
    - check your connection back to the uplink (maybe layer 1 issues: cable , gbic,etc )
    - are you using port channels or trunks ?
    - do a show port stats and look for errors


    #7210


  • 3.  RE: AP 105s and 65s keep failing back and forth when both 7210 Master/Local are up

    Posted Jul 12, 2013 10:18 AM

    Currently it is setup with LMS.  We had a single controller before so I re-used the port it had been using without issue and mirrored the configuration to the connection that the local controller is connected to.  We have a Trunk port.  It certainly looks more like a controller connectivity issue as when it happens all APs drop off the Local and go to the Master and sit in a D Flag state they all eventually clear up and then not long after that happens they all fail back to the Local and go into a D Flag state and then slowly clear up and run normal for days.  Currently just using the 1 Gig Copper Connections so none of the GBIC slots are populated.  The Master and Local are patched to the same Cisco 6509 Switch but on different blade slots.


    #7210


  • 4.  RE: AP 105s and 65s keep failing back and forth when both 7210 Master/Local are up

    Posted Jul 12, 2013 09:35 AM
    Does the D disappears after a certain time in the master controller , meaning are the APs able to come up normally in the master ?
    #7210


  • 5.  RE: AP 105s and 65s keep failing back and forth when both 7210 Master/Local are up

    Posted Jul 12, 2013 09:51 AM

     

    Run the following commands too these may give some information:

     

    show log system all

    show log error-log all

    show log network all

    show  ap  debug system-status ap-name <apname>

     


    #7210


  • 6.  RE: AP 105s and 65s keep failing back and forth when both 7210 Master/Local are up

    Posted Jul 12, 2013 09:55 AM

    I am curious to see what the resolution for this problem.  I have same problems with my 3400 controller backup for another two 3400 controllers in N+1 VRRP.  My controllers are AOS 6.1.3.2, which I already scheduled to upgrade to 7220 AOS 6.2

     

    The log showed once or twice a day, the backup transformed itself to Master or ACTIVE, but the ACTIVE controllers announced the presence with higher priority so, the BACKUP backed out.  APs moved back and forth, and dropped clients.  All controllers are on the same vlan

     

    (BACKUP) #show log system 10
    
    Jul 2 09:16:51 :313328:  <WARN> |fpapps|  vrrp: vrid "35" - VRRP state transitioned from MASTER to BACKUP
    Jul 2 09:16:51 :313332:  <WARN> |fpapps|  VRRP: vrid "35"(Master) -  Received VRRP Advertisement with HIGHER PRIORITY (150) from x.x.x.x
    Jul 2 09:22:37 :313331:  <WARN> |fpapps|  VRRP: vrid "25" - Missed 3 Hello Advertisements from VRRP Master 172.17.254.22
    Jul 2 09:22:37 :313328:  <WARN> |fpapps|  vrrp: vrid "25" - VRRP state transitioned from BACKUP to MASTER
    Jul 2 09:22:37 :313328:  <WARN> |fpapps|  vrrp: vrid "25" - VRRP state transitioned from MASTER to BACKUP
    Jul 2 09:22:37 :313332:  <WARN> |fpapps|  VRRP: vrid "25"(Master) -  Received VRRP Advertisement with HIGHER PRIORITY (150) from x.x.x.x
    Jul 2 09:26:28 :313331:  <WARN> |fpapps|  VRRP: vrid "25" - Missed 3 Hello Advertisements from VRRP Master 172.17.254.22
    Jul 2 09:26:28 :313328:  <WARN> |fpapps|  vrrp: vrid "25" - VRRP state transitioned from BACKUP to MASTER
    Jul 2 09:26:28 :313328:  <WARN> |fpapps|  vrrp: vrid "25" - VRRP state transitioned from MASTER to BACKUP
    Jul 2 09:26:28 :313332:  <WARN> |fpapps|  VRRP: vrid "25"(Master) -  Received VRRP Advertisement with HIGHER PRIORITY (150) from x.x.x.x

     


    #7220
    #7210


  • 7.  RE: AP 105s and 65s keep failing back and forth when both 7210 Master/Local are up

    Posted Aug 06, 2013 10:58 AM

    Just had a moment to run a show log system all and saw this issue in the logs.

     

    Aug 6 08:55:19 :304001:  <ERRS> |stm|  Unexpected stm (Station management) runtime error at arm_update, 323, Invalid length AP 00:24:6c:1a:6f:e0 got 23423 expect 1388
    Aug 6 08:55:26 :303073:  <ERRS> |nanny|  Process /mswitch/bin/stm [pid 13935] died: got signal SIGSEGV
    Aug 6 08:55:33 :303029:  <ERRS> |nanny|  Process /mswitch/bin/stm [pid 13935]: crash data saved in dir /flash/crash/process/8-6-2013@08-55-26/stm
    Aug 6 08:55:38 :303079:  <ERRS> |nanny|  Restarted process /mswitch/bin/stm, new pid 27675
    Aug 6 08:55:38 :303025:  <ERRS> |nanny|  Found core file /tmp/core.13935.stm.A72xx_38532, 65339392 bytes, compressing...
    Aug  6 08:55:42  KERNEL:   0:<7>UDP: short packet: From 255.255.255.255:8211 1621/1517 to 129.2.139.140:8419
    --More-- (q) quit (u) pageup (/) search (n) repeat
                                                      
    Aug 6 08:55:49 :304001:  <ERRS> |stm|  Unexpected stm (Station management) runtime error at handle_ap_statistics, 1019, Length mismatch expected 1527 received 1387 from             AP with eth_mac d8:c7:c8:c6:96:d3, and phy_type is 1
    Aug 6 08:55:53 :304001:  <ERRS> |stm|  Unexpected stm (Station management) runtime error at handle_ap_statistics, 1019, Length mismatch expected 1527 received 1387 from             AP with eth_mac 00:24:6c:c9:96:32, and phy_type is 1
    Aug 6 08:55:54 :304001:  <ERRS> |stm|  Unexpected stm (Station management) runtime error at handle_ap_statistics, 1019, Length mismatch expected 1527 received 1387 from             AP with eth_mac d8:c7:c8:c6:96:bb, and phy_type is 1
    Aug 6 08:55:55 :304001:  <ERRS> |stm|  Unexpected stm (Station management) runtime error at handle_ap_statistics, 1019, Length mismatch expected 1527 received 1387 from             AP with eth_mac 00:24:6c:c9:a7:08, and phy_type is 1
    Aug 6 08:55:59 :304001:  <ERRS> |stm|  Unexpected stm (Station management) runtime error at handle_ap_statistics, 1019, Length mismatch expected 1527 received 1387 from             AP with eth_mac 00:24:6c:c9:a6:e4, and phy_type is 1
    Aug 6 08:55:59 :304001:  <ERRS> |stm|  Unexpected stm (Station management) runtime error at handle_ap_statistics, 1019, Length mismatch expected 1527 received 1387 from             AP with eth_mac 00:1a:1e:c7:c0:4e, and phy_type is 1
    Aug 6 08:56:04 :304001:  <ERRS> |stm|  Unexpected stm (Station management) runtime error at handle_ap_statistics, 1019, Length mismatch expected 1527 received 1387 from             AP with eth_mac 00:24:6c:c9:96:34, and phy_type is 1
    Aug 6 08:56:04 :304001:  <ERRS> |stm|  Unexpected stm (Station management) runtime error at handle_ap_statistics, 1019, Length mismatch expected 1527 received 1387 from             AP with eth_mac 00:24:6c:c9:a6:f2, and phy_type is 1
    Aug 6 08:56:07 :303080:  <ERRS> |nanny|  Please tar and email the file crash.tar to support@arubanetworks.com
    Aug 6 08:56:07 :303081:  <ERRS> |nanny| To tar type the following commands at the Command Line Interface: (1) tar crash (2) copy flash: crash.tar tftp: [serverip] [destn filename]
    Aug 6 08:56:08 :304001:  <ERRS> |stm|  Unexpected stm (Station management) runtime error at handle_ap_statistics, 1019, Length mismatch expected 1527 received 1387 from             AP with eth_mac 00:1a:1e:c7:c2:28, and phy_type is 1
    Aug 6 08:56:08 :304001:  <ERRS> |stm|  Unexpected stm (Station management) runtime error at handle_ap_statistics, 1019, Length mismatch expected 1527 received 1387 from             AP with eth_mac 00:24:6c:c9:96:20, and phy_type is 1
    Aug 6 08:56:10 :304001:  <ERRS> |stm|  Unexpected stm (Station management) runtime error at handle_ap_statistics, 1019, Length mismatch expected 1527 received 1387 from             AP with eth_mac 00:24:6c:c9:a7:56, and phy_type is 1
    Aug 6 08:56:13 :311004:  <WARN> |AP RIEOC_AP105.10@10.200.200.10 sapd|  Missed 25 heartbeats; rebootstrapping
    Aug 6 08:56:13 :311004:  <WARN> |AP RIHPHC_AP105.1@10.203.89.2 sapd|  Missed 25 heartbeats; rebootstrapping
    Aug 6 08:56:13 :311004:  <WARN> |AP RIETH_AP105.2@10.230.40.4 sapd|  Missed 25 heartbeats; rebootstrapping
    Aug 6 08:56:13 :311004:  <WARN> |AP RIDOTMT_AP105.1@10.203.36.11 sapd|  Missed 25 heartbeats; rebootstrapping
    Aug 6 08:56:13 :311004:  <WARN> |AP RIETH_AP105.1@10.230.40.3 sapd|  Missed 25 heartbeats; rebootstrapping
    Aug 6 08:56:14 :311004:  <WARN> |AP RIEOC_AP105.8@10.200.200.12 sapd|  Missed 25 heartbeats; rebootstrapping
    Aug 6 08:56:14 :311004:  <WARN> |AP RIEOC_AP105.5@10.200.200.9 sapd|  Missed 25 heartbeats; rebootstrapping
    Aug 6 08:56:14 :311004:  <WARN> |AP RIEOC_AP105.3@10.200.200.8 sapd|  Missed 25 heartbeats; rebootstrapping
    Aug 6 08:56:14 :311004:  <WARN> |AP RISH_AP65.7@10.230.4.3 sapd|  Missed 25 heartbeats; rebootstrapping
    Aug 6 08:56:14 :311004:  <WARN> |AP RISH_AP65.3@10.230.4.7 sapd|  Missed 25 heartbeats; rebootstrapping
    Aug 6 08:56:14 :311004:  <WARN> |AP RISH_AP65.1@10.230.4.6 sapd|  Missed 25 heartbeats; rebootstrapping
    Aug 6 08:56:14 :311004:  <WARN> |AP RIDOA_AP105.8@158.123.114.202 sapd|  Missed 25 heartbeats; rebootstrapping
    Aug 6 08:56:14 :311004:  <WARN> |AP RISH_AP65.2@10.230.4.5 sapd|  Missed 25 heartbeats; rebootstrapping
    Aug 6 08:56:14 :311004:  <WARN> |AP RIEOC_AP105.12@10.200.200.6 sapd|  Missed 25 heartbeats; rebootstrapping
    Aug 6 08:56:14 :311004:  <WARN> |AP RIEOC_AP105.7@10.200.200.2 sapd|  Missed 25 heartbeats; rebootstrapping
    Aug 6 08:56:14 :311004:  <WARN> |AP RIDOA_AP65.1@158.123.114.207 sapd|  Missed 25 heartbeats; rebootstrapping
    Aug 6 08:56:14 :311004:  <WARN> |AP RISH_AP65.8@10.230.4.4 sapd|  Missed 25 heartbeats; rebootstrapping
    Aug 6 08:56:15 :311004:  <WARN> |AP RIDOA_AP65.5@158.123.114.151 sapd|  Missed 25 heartbeats; rebootstrapping
    Aug 6 08:56:15 :311004:  <WARN> |AP RISH_AP65.9@10.230.4.10 sapd|  Missed 25 heartbeats; rebootstrapping
    Aug 6 08:56:15 :311004:  <WARN> |AP RISH_AP65.6@10.230.4.2 sapd|  Missed 25 heartbeats; rebootstrapping
    Aug 6 08:56:15 :311004:  <WARN> |AP RIDOA_AP65.6@158.123.114.150 sapd|  Missed 25 heartbeats; rebootstrapping
    Aug 6 08:56:15 :311004:  <WARN> |AP RIPUC_AP65.1@10.203.1.3 sapd|  Missed 25 heartbeats; rebootstrapping
    Aug 6 08:56:15 :311004:  <WARN> |AP RIDOA_AP65.12@158.123.114.160 sapd|  Missed 25 heartbeats; rebootstrapping
    Aug 6 08:56:16 :311004:  <WARN> |AP RIDOA_AP65.11@158.123.114.148 sapd|  Missed 25 heartbeats; rebootstrapping
    --More-- (q) quit (u) pageup (/) search (n) repeat
                                                      
    Aug 6 09:06:28 :304001:  <ERRS> |stm|  Unexpected stm (Station management) runtime error at handle_ap_statistics, 1019, Length mismatch expected 1527 received 1387 from             AP with eth_mac 00:24:6c:c9:a6:f6, and phy_type is 1
    Aug 6 09:06:46 :304001:  <ERRS> |stm|  Unexpected stm (Station management) runtime error at handle_ap_statistics, 1019, Length mismatch expected 1527 received 1387 from             AP with eth_mac 00:24:6c:c9:a6:fe, and phy_type is 1
    Aug 6 09:06:50 :304001:  <ERRS> |stm|  Unexpected stm (Station management) runtime error at handle_ap_statistics, 1019, Length mismatch expected 1527 received 1387 from             AP with eth_mac d8:c7:c8:c6:96:d3, and phy_type is 1
    Aug  6 09:06:50  KERNEL:   0:<7>UDP: short packet: From 255.255.255.255:8211 1621/1517 to 129.2.139.140:8419
    Aug 6 09:06:50 :304001:  <ERRS> |stm|  Unexpected stm (Station management) runtime error at handle_ap_statistics, 1019, Length mismatch expected 1527 received 1387 from             AP with eth_mac 6c:f3:7f:c5:bd:ec, and phy_type is 1
    Aug  6 09:07:10  KERNEL:   0:<7>UDP: short packet: From 255.255.255.255:8211 1621/1517 to 193.0.12.160:8419


    #7210


  • 8.  RE: AP 105s and 65s keep failing back and forth when both 7210 Master/Local are up

    Posted Jul 12, 2013 10:24 AM
    Do you experiencing the same issues if you use LMS primary / backup setup ?

    Have you tried disabling preemption ?

    Are you sharing the VRRP segment /VLAN with anything else in your network ?


    #7210


  • 9.  RE: AP 105s and 65s keep failing back and forth when both 7210 Master/Local are up

    Posted Jul 12, 2013 10:29 AM

    Are you using aruba supported gbics ?

    Have you taken a look at one of the APs when this occurring through the console ?
    #7210


  • 10.  RE: AP 105s and 65s keep failing back and forth when both 7210 Master/Local are up

    Posted Jul 12, 2013 11:20 AM

    Currently the AP Configuration is setup with an LMS IP which is the Local Controller and then a Backup LMS IP which is the Master.  I haven't tried disabling preemption.  Our Controllers are the Production VLAN.  I have looked at one I haven't looked at any recently which all I have seen is it looses communication and then reconnects it connects back and forth between the 2 controllers and the uptime never resets which if I manually pull an AP offline that number right away clears.  Currently all the GBIC slots are not populated just using one of the 1 Gig Interfaces on the Controller 0/0/0.


    #7210


  • 11.  RE: AP 105s and 65s keep failing back and forth when both 7210 Master/Local are up

    Posted Aug 06, 2013 11:00 AM

    Via the console when they go into the funky state I just see the device re-cycle a few times and finally go back online.  Never does the device shut down and the Controller maintains the status of the uptime and never shows it officially down until I pull the network connection/PoE and then it resets the counter.


    #7210