Wireless Access

last person joined: yesterday 

Access network design for branch, remote, outdoor, and campus locations with HPE Aruba Networking access points and mobility controllers.
Expand all | Collapse all

340 series AP's with 8.3.0.3 taking forever to come up on controller

This thread has been viewed 4 times
  • 1.  340 series AP's with 8.3.0.3 taking forever to come up on controller

    Posted Nov 12, 2018 07:22 PM

    We have a new fairly install of approximately 190 AP's, all 340 series. We are having issue that started when we upgraded from 8.3.0.2 to 8.3.0.3 where AP's that are power cycled or taken offline for whatever reason sometimes reboot many, many times, generating hundreds of Airwave alerts.

     

    It appears that they are trying to TFTP an image from the controllers, which fails part way through, and they proceed to do this over and over. For example today we rebooted a switch that had 9 AP attached to it.  A few of them came up immediately, while several of them rebooted well over 100 times.  It took over 3 hours for all the AP's to come back up.  We upgraded to 8.3.0.4 over the weekend and it has the same issue.

     

    Has anyone seen this issue or have any ideas what would cause it?  We've had a support case open for a couple of months now, which I've already asked to be escalated, but it never seems to go anywhere.  They simply ask for the same logs over and over.



  • 2.  RE: 340 series AP's with 8.3.0.3 taking forever to come up on controller

    EMPLOYEE
    Posted Nov 12, 2018 07:38 PM
    Do you have ftp blocked between your access points and the controller? The access point should only try tftp if they are new or if from is blocked.


  • 3.  RE: 340 series AP's with 8.3.0.3 taking forever to come up on controller

    Posted Nov 12, 2018 07:45 PM

    Nothing is blocked.  There's no firewall between the AP's and controller.  At one point we rolled back to 8.3.0.2 and the issue went away.  When we upgraded again to 8.3.0.3, it came back.

     

    What's weird is it doesn't happen with every AP that reboots, and once it finally comes up, performance is great - seeing 200+mbps wireless throughput.

     

    Also, the issue does not occur when the controllers are upgraded or downgraded.  All the AP's come come up quickly in this scanario.  We first experienced it with new AP's but later discovered it was happening when existing ones reboot also.

     

    Here's an example we captured from the console of an AP with this issue.  This repeats over and over.  Sometimes for hours, but eventually the AP will come up:

     

    DNS request 1 for aruba-master.jfbc.org to 10.100.107.31
    Using eth0 device
    TFTP from server 10.100.5.5; our IP address is 10.100.105.68; sending through gateway 10.100.105.1
    Filename 'arm64.ari'.
    Load address: 0x8000000
    Loading: ##############################T T T T T T T T T T
    Retry count exceeded; starting again
    eth1 link is down
    eth0: link up, speed 1000 Mb/s, full duplex
    DHCP broadcast 1
    DHCP IP address: 10.100.105.68
    DHCP subnet mask: 255.255.255.0
    DHCP def gateway: 10.100.105.1
    DHCP DNS server: 10.100.107.31
    DHCP DNS domain: jfbc.org
    ADP broadcast 3
    ADP multicast 4
    ADP broadcast 4
    ADP multicast 5
    ADP broadcast 5

    Retry count exceeded



  • 4.  RE: 340 series AP's with 8.3.0.3 taking forever to come up on controller

    EMPLOYEE
    Posted Nov 12, 2018 07:56 PM
    Are you using clustering? APs in a cluster should use the nodelist and not aruba-master to find the controller.


  • 5.  RE: 340 series AP's with 8.3.0.3 taking forever to come up on controller

    Posted Nov 12, 2018 07:59 PM

    Yes, we have a cluster of 3 controllers.  10.100.5.5 is the cluster virtual IP.  The log I just posted was from a test one that was set to DHCP.  The actual ones in production are set to static IP address and static controller address is set to the cluster virtual IP.  Is that not the correct way to do it?



  • 6.  RE: 340 series AP's with 8.3.0.3 taking forever to come up on controller

    Posted Nov 12, 2018 08:09 PM

    I think what is happening is it fails when the AP reboots and connects to a different cluster member than it was on before the reboot.  The original log I posted was from an AP that had been factory reset.  Here's one from an AP that was already in service that I just rebooted.  In this one it's saying the CPSEC certificate was rejected and it looks like as a result it could not bring up it's GRE tunnel:

     

    Starting DHCP
    net.ipv4.conf.all.arp_notify = 1
    Got al[   32.538035] device eth0 entered promiscuous mode
    l network params from APboot env. Skipping DHCP
    net.ipv4.conf.all.arp_notify = 0
    net.ipv4.conf.br0.arp_notify = 1
    Static IP detected. Checking if LACP needs to be configured
    LACP setting not required. AP can ping default gateway
    10.100.4.91 255.255.255.0 10.100.4.1
    Running ADP...Done. Master is 10.100.5.5
    [   35.001158] wl 0000:01:00.0: enabling device (0000 -> 0002)
    [   35.102783] wifi0: AP type AP-345, radio 0, max_bssids 16
    [   35.299617] wl 0001:01:00.0: enabling device (0000 -> 0002)
    [   35.403573] wifi1: AP type AP-345, radio 1, max_bssids 16
    AP rebooted Mon Nov 12 19:57:18 EST 2018; Unable to set up IPSec tunnel to saved lms, Error:RC_ERROR_CPSEC_CERT_REJECTED
    shutting down watchdog process (nanny will restart it)...
    
            <<<<<       Welcome to the Access Point     >>>>>
    
    password: [   37.776931] apType 107 hw_opmode 5
    [   37.810431] radio 0: band 1 ant 0 max_ssid 16
    [   37.862600] radio 1: band 0 ant 0 max_ssid 16
    [   85.102594] random: nonblocking pool is initialized
    [  249.743627] asap_gre_ipv4_err: Received ICMP (DEST_UNREACH, PROT_UNREACH) from 10.100.5.8 (sby/aac=0/1) for heartbeat tunnel.
    [  249.872034] asap_gre_ipv4_err: Received ICMP (DEST_UNREACH, PROT_UNREACH) from 10.100.5.8 (sby/aac=0/1) for heartbeat tunnel.
    [  250.007641] asap_gre_ipv4_err: Received ICMP (DEST_UNREACH, PROT_UNREACH) from 10.100.5.8 (sby/aac=0/1) for heartbeat tunnel.
    [  309.743112] asap_gre_ipv4_err: Received ICMP (DEST_UNREACH, PROT_UNREACH) from 10.100.5.8 (sby/aac=0/1) for heartbeat tunnel.
    [  309.871476] asap_gre_ipv4_err: Received ICMP (DEST_UNREACH, PROT_UNREACH) from 10.100.5.8 (sby/aac=0/1) for heartbeat tunnel.
    The system is going down NOW !!
    Sending SIGTERM to all processes.
    Sending SIGKILL[  373.922639] UBIFS (ubi1:4): un-mount UBI device 1
    Unmounting UBIFS completed.4): background thread "ubifs_bgt1_4" st
    Please stand by while rebooting the system.
    [  376.143060] reboot: Restarting system
    HELO
    5.0202-1.0.38-161.122
    CPU0
    L1CD
    MMUI
    MMU8
    CODE
    ZBBS
    MAIN
    NVRAM memcfg 0x41427
    MCB chksum 0xa259e84b, config 0x41427
    DDR3-1600 CL11 total 512MB 2 16bits part[s] %1 SSC


  • 7.  RE: 340 series AP's with 8.3.0.3 taking forever to come up on controller

    EMPLOYEE
    Posted Nov 12, 2018 08:56 PM
    Yes.

    It looks like you could have a connectivity issue between your access points and the controller. The first device started the tftp transfer and then did not finish due to some errors. We would need to know everything between your access points and the controller to understand what could be happening. It could be something like a MTU issue.


  • 8.  RE: 340 series AP's with 8.3.0.3 taking forever to come up on controller

    Posted Nov 12, 2018 09:06 PM

    We have researched that and there's no connectivity issue that we can find.  There are only 2 switches between the the controller and most AP's.  Controllers are connected to core.  Most AP's are connected to an access switch that's directly connected to the core.  No other devices are having any issues.

     

    Also I don't believe a connectivity issue would cause the CPsec Certificate rejection.  I'm not ruling that out, but there is no evidence I can find of a connectivity failure.

     

    Can you explain what I'm doing wrong with the AP discovery? I've read through the AOS 8 Fundamentals guide here https://community.arubanetworks.com/t5/Controller-Based-WLANs/ArubaOS-8-Fundamentals-Guide/ta-p/428914

    And it states that whether using DNS, DHCP, or Static discovery, that it should be pointed at the cluster VIP.  What I'm I missing?

     

     



  • 9.  RE: 340 series AP's with 8.3.0.3 taking forever to come up on controller

    EMPLOYEE
    Posted Nov 12, 2018 09:41 PM
    I don't think you are doing anything wrong. The reason why I said it could be an MTU issue is that frequently certificate exchanges could be larger than 1500 bytes and if your layer 3 switch is not fragmenting them, that could be your issue. It could be the same issue with the TFTP transfer, as well. What is your layer 3 switch vendor?


  • 10.  RE: 340 series AP's with 8.3.0.3 taking forever to come up on controller

    Posted Nov 12, 2018 09:54 PM
    Layer 3 switch is a Cisco 6500.

    Controllers are virtual, hosted on vsphere 6.5. Forged Transmits and Promiscuous Mode are enabled on the vswitches. I believe everything else is at the default settings.


  • 11.  RE: 340 series AP's with 8.3.0.3 taking forever to come up on controller

    EMPLOYEE
    Posted Nov 12, 2018 10:03 PM
    You should attempt a ping with regulatory packet sizes from the AP subnet to the controller. Then you should do a ping from the AP subnet to the controller with progressively larger sizes to see the largest packet you can pass. If you can pass a ping packet much larger than 1500 you should consider getting a packet capture between devices that constantly do not work so they can be analyzed.


  • 12.  RE: 340 series AP's with 8.3.0.3 taking forever to come up on controller

    Posted Nov 12, 2018 11:32 PM

    I am able to send pings of 2000 bytes from the controllers to several AP's with no issue. I tried this from all 3 controllers to several different AP's. 2000 is the largest size the controller allows me to send:

    (ecc-mc01) #ping 10.100.4.62 packet-size 2000 count 10
    
    Press 'q' to abort.
    Sending 10, 2000-byte ICMP Echos to 10.100.4.62, timeout is 2 seconds:
    !!!!!!!!!!
    Success rate is 100 percent (10/10), round-trip min/avg/max = 0.737/1.1358/1.563 ms

    I'll try a packet capture and see what I can see there.  The thing is, there is no AP that does this consistantly, and it's not isolated to one switch or network segment.  Also, once they come up, they are completely stable.  It's only on a reboot that this happens, and only with AOS 8.3.0.3 and higher.  Based on the fact that it started in a specific firmware version, I have a strong suspicion it's a bug, but I really have no way to prove that and unfortunately support hasn't been much help.

     

    Is it possible there's something wrong with my CPSec config or certificates?  It seems like all I can really do is turn it on or off.



  • 13.  RE: 340 series AP's with 8.3.0.3 taking forever to come up on controller

    EMPLOYEE
    Posted Nov 12, 2018 11:56 PM
    I don't think there is anything wrong with CPSEC, because there is nothing to configure. Also, your tftp load does not involve CPSEC.


  • 14.  RE: 340 series AP's with 8.3.0.3 taking forever to come up on controller

    EMPLOYEE
    Posted Nov 13, 2018 12:22 AM
    It may be tedious, but if you find an access point in that state, you should mirror the switch interface to a different Port on that switch and do a packet capture with a laptop. That would allow us to see the back and forth traffic on that access point and more importantly where it fails.


  • 15.  RE: 340 series AP's with 8.3.0.3 taking forever to come up on controller

    Posted Nov 13, 2018 10:53 AM
    What size packet should I be able to pass from the controller to the AP? I just tried several different sizes with the Don't Fragment but set and the largest that passed is 1250. That seems pretty low and leads me to believe there may be excessive fragmentation.

    However, I’m able to pass a 1500 byte packet from the controller to its gateway address on the core switch. I can do 1450 from the controller to the AP subnet. It only drops to 1250 to an actual AP.

    This leads me to believe whatever is happening is at the AP level. How would I find out what the MTU of the AP is? Did something change in 8.3.0.3 that would cause the APs to use a smaller MTU?


  • 16.  RE: 340 series AP's with 8.3.0.3 taking forever to come up on controller

    EMPLOYEE
    Posted Nov 13, 2018 04:42 PM

    I don't know, but you should provide that information to TAC, if you have not already.  Again, I am just guessing based on the information that you gave me.



  • 17.  RE: 340 series AP's with 8.3.0.3 taking forever to come up on controller

    EMPLOYEE
    Posted Nov 13, 2018 11:14 PM

    The VMWare server you have running the virtual controllers, are they dual homed with two+ NICs on the controller's vSwitches, and presumably if you have that setup you are using NIC teaming? If so, please check the virtual controller installation guide to ensure the setup is correct ( you would need LACP on the uplink switch and some settings on the vSphere server. 



  • 18.  RE: 340 series AP's with 8.3.0.3 taking forever to come up on controller

    Posted Nov 13, 2018 11:41 PM
    The controllers are using a dedicated virtual switch on each host that only has a single uplink.


  • 19.  RE: 340 series AP's with 8.3.0.3 taking forever to come up on controller

    Posted Nov 15, 2018 09:20 AM

    Hey guys, 

     

    I just migrated from the 8.3.0.3 to 8.3.0.4. there a handful of bugs in the 8.3.0.3 code. Since I have done that things have been more stable. I have a virtual Mobilit Master and 2 7280 Controllers and using the 345 AP's. I am still having the same issue you are reporting. I actually tried a traffic capture got absolutely no helpful information. This seems to be random and out of the blue. 

     

    A coommand the may be help in getting infromation provide the AP is reachable would be the following.

     

    show ap debug system-status ap-name XXXX

     

    The physical interh Eth0 is at MTU 1500 the tunnel MTU is at 1300, I thinks the data payload from the wireless perspective may be MTU 1200. I'm not 100% on the 1200.  I am scratching my head here as well. Do you have a primary and secondary LMS configured?



  • 20.  RE: 340 series AP's with 8.3.0.3 taking forever to come up on controller

    EMPLOYEE
    Posted Dec 13, 2018 12:49 PM

    Could you provide AP uplink capture for analysis?



  • 21.  RE: 340 series AP's with 8.3.0.3 taking forever to come up on controller

    Posted Dec 17, 2018 06:37 PM

    <deleted>



  • 22.  RE: 340 series AP's with 8.3.0.3 taking forever to come up on controller

    Posted Dec 17, 2018 10:12 PM

    @GO-RILLA wrote:

    The physical interh Eth0 is at MTU 1500 the tunnel MTU is at 1300, I thinks the data payload from the wireless perspective may be MTU 1200. I'm not 100% on the 1200.  I am scratching my head here as well. Do you have a primary and secondary LMS configured?


    We have only the primary LMS configured and it is set to the Cluster Virtual IP.  The MTU field was blank in my AP System Profile.  I tried setting it to 1500 but that didn't make any difference.



  • 23.  RE: 340 series AP's with 8.3.0.3 taking forever to come up on controller

    Posted Dec 17, 2018 06:35 PM

    Try "ping <ip> packet-size 1472 df-flag"

     

    If you're running CPSEC without jumbo frames this will probably fail. 



  • 24.  RE: 340 series AP's with 8.3.0.3 taking forever to come up on controller

    Posted Dec 17, 2018 10:18 PM

    @farrands wrote:

    Try "ping <ip> packet-size 1472 df-flag"

     

    If you're running CPSEC without jumbo frames this will probably fail. 


    I've actually been wondering if enabling jumbo frames end to end would solve the issue.  However, I do not believe jumbo frames are currently supported on the VMC.  The checkbox is there, but the documentation states it's only supported on the 7000 and 7200 series.  I did see in the 8.4 beta release notes that it supports jumbo frames in the VMC, so I think we will have to wait until 8.4 is released to be able to do jumbo frames.

     

    I'm not really convinced however that MTU is the issue considering the issue does not occur in 8.3.0.2.  It seems like a bug was introduced in 8.3.0.3.  Possibly with the path MTU discovery mechanism?



  • 25.  RE: 340 series AP's with 8.3.0.3 taking forever to come up on controller

    Posted Dec 18, 2018 10:38 AM

    I'm uncertain if the VMC will support Jumbo frames.  I'm running with 7200 controller's.

     

    I've been running 8.2.2.1, and am planning on jumping up to 8.3.0.4 just after the first of the year.... I'd been hoping for an 8.3.1.x release to go with, but I have some 340 series AP's I want to start testing.



  • 26.  RE: 340 series AP's with 8.3.0.3 taking forever to come up on controller

    EMPLOYEE
    Posted Dec 13, 2018 12:52 PM

    Derek,

     

    I would like to know if moving to 8.3.0.3/4 was suggested by Aruba or for any fix/feature.

     

    Do you happen to have logs from controller when AP was taking ~3 hours to come up on controller?



  • 27.  RE: 340 series AP's with 8.3.0.3 taking forever to come up on controller

    Posted Dec 17, 2018 10:14 PM

    @kmookkandi wrote:

    Derek,

     

    I would like to know if moving to 8.3.0.3/4 was suggested by Aruba or for any fix/feature.

     

    Do you happen to have logs from controller when AP was taking ~3 hours to come up on controller?


    We upgraded because 8.3.0.2 has a huge number of bugs.  Quite a few that affected use are resolved in later versions.  The controller logs have been submitted to TAC multiple times and should be attached to the ticket.