Wireless Access

last person joined: yesterday 

Access network design for branch, remote, outdoor, and campus locations with HPE Aruba Networking access points and mobility controllers.
Expand all | Collapse all

AP fails to rejoin controller after AP reboot

This thread has been viewed 45 times
  • 1.  AP fails to rejoin controller after AP reboot

    Posted Jul 05, 2017 06:21 PM

    Have come across an issue today at one of our remote sites, that I've been struggling to resolve.

     

    Site in question has 2 x local controllers, which manages 25 Access Points. We also have a main site with 2 x controllers that act as the Master devices. If I reboot a previously connected AP at the remote site, it will connect to the Switch and pick up a DHCP address, but never connect to a controller. I connected via a console to an AP today, and could see that it picked up the correct DHCP information (IP, Subnet and Gateway) along with the IP address configured for as the master controller. From a console output perspective, the AP then stays connected to the network (ping to its IP address is successful) but does not connect to a controller.

     

    If I reset the AP, and run through the conversion process again entering the master IP address, I can see the AP join the master controller and I'm then able to provision it on to the local controllers, it will reboot and connect. Any subsequent reboots then cause it to disconnect from the controllers.

     

    I've rebooted another AP at the same site prior to leaving site this evening and this is also showing the same behaviour, so I'm faily convinced it's a site issue.

     

    2 x Master Controllers - 150 APs at Main site - APs resides in 'MainSite' AP Group

    2 x Local Controllers - 25 APs at local site - APs reside in 'localgroup' AP Group

     

    APs are 325, and we're running 6.4.4.11 on the controllers.

     

    DHCP scope for the Access Points is configured with Option 60 of ArubaAp and option 43 having the IP address of the master controller.

    Both sites are using different IP ranges for the AP connections and also the user subnets.

     

    The controllers also reside in their own separate Layer 3 networks. Routing between all is working as expected.

     

    Is anyone able to give any pointers on any particular troubleshooting steps which may help pinpoint the cause of the issue, or any settings I need to check within the controller config that tend to cause issues like those described above?

     

    TIA

    Dan



  • 2.  RE: AP fails to rejoin controller after AP reboot

    Posted Jul 05, 2017 09:34 PM
    During the boot process do you see the AP able to discover the controller , this happens right after obtaining an ip ?

    Can you the ping the controller from the AP console ?

    Try running the show log system all | include AP MAC address or name
    And see if anything shows up on the controller side.

    Are you guys experiencing any network delays between the remote sites and where the controller is located ?



    Get Outlook for iOS


  • 3.  RE: AP fails to rejoin controller after AP reboot

    Posted Jul 06, 2017 03:50 AM
      |   view attached

    Thanks for the reply.

     

    Regards connectivity between the 2 sites, I'm happy that is running without issues. We have services running between the 2 and there are no other issues. I'm able to ping from the SVI that hosts the APs to the controller IP address, no drops and relatively low latency (15ms latency, sites are about 150 miles apart).

     

    I've run the show show log system all | include AP command from the master controller, and it appears I can see logs from the AP that is trying to connect, but there appear to be timeout messages within those logs.

     

    Does the Hello Timeout indicate the AP is not communicating correctly with the controller, or the controller is not able to see the AP on it's IP address? Also, there's mention there of the packet length being 1504, could this potentially being caused any MTU related issues?

    Attachment(s)



  • 4.  RE: AP fails to rejoin controller after AP reboot

    Posted Jul 06, 2017 04:41 AM
      |   view attached

    Attached is a copy of the console output when the AP boots. I can see at 20 seconds in to the boot, the AP picks up its IP address and also that of the Master controller, but then there doesn't appear to be any further connectivity.

    Attachment(s)

    txt
    AP016 Boot.txt   7 KB 1 version


  • 5.  RE: AP fails to rejoin controller after AP reboot

    EMPLOYEE
    Posted Jul 06, 2017 05:35 AM

    I see that you have two ethernet ports connected.

     

    Try to reboot the AP, and when the console reaches 

    Hit <Enter> to stop autoboot

     Press Enter and then type "printenv" to see what variables are configured and paste it into your reply.



  • 6.  RE: AP fails to rejoin controller after AP reboot

    Posted Jul 06, 2017 05:50 AM

    printenv information as requested

     

    Hit <Enter> to stop autoboot: 2  0
    apboot> printenv
    bootdelay=2
    baudrate=9600
    autoload=n
    boardname=Octomore
    bootcmd=boot ap
    autostart=yes
    bootfile=ipq806x.ari
    mtdids=nand0=nand0
    ethaddr=a8:bd:27:ca:72:7a
    os_partition=0
    NEW_SBL2=1
    backup_vap_init_master=10.100.100.101
    backup_vap_password=6745C6236998734069D9DAD9AFE6BD8691CF40D94CEC6BB90292A296277BCBE9
    num_ipsec_retry=85
    previous_lms=0
    backup_vap_opmode=0
    backup_vap_band=2
    name=LDNAP016
    group=GROUP2
    syslocation=
    master=10.100.100.100
    ip6prefix=64
    serverip=10.100.100.100
    a_antenna=0
    g_antenna=0
    usb_type=0
    uplink_vlan=0
    auto_prov_id=0
    is_rmp_enable=0
    priority_ethernet=0
    priority_cellular=0
    cellular_nw_preference=1
    usb_power_mode=0
    ap_power_mode=0
    cert_cap=0
    mesh_role=0
    installation=1
    mesh_sae=0
    start_type=warm_start
    stdin=serial
    stdout=serial
    stderr=serial
    machid=1260
    mtdparts=mtdparts=nand0:0x2000000@0x0(aos0),0x2000000@0x2000000(aos1),0x4000000@0x4000000(ubifs)
    partition=nand0,0
    mtddevnum=0
    mtddevname=aos0
    ethact=eth0

    Environment size: 991/65532 bytes
    apboot>

     

    .100 is the VRRP address for the master contollers. 101 is the Primary device within the VRRP pair.



  • 7.  RE: AP fails to rejoin controller after AP reboot

    EMPLOYEE
    Posted Jul 06, 2017 06:08 AM

    Question:

     

    Why do you have the master and serverip statically configured, instead of using DNS or dhcp discovery? (doing this could mask an issue with discovery)

     

    Does the group "GROUP2" exist?

     

    You should boot the AP and type "show datapath session table <ip address of ap>" repeatedly on the master controller to see if the AP is sending any traffic.

     

    Did this AP ever work?

     

    What was the last change you made before the AP stopped working?

     

    After you boot the ap, you should type "show ap database" on the master controller to see if that AP has any flags that could explain your problem.

     

    You should also try a single ethernet port at a time to eliminate any configuration issues with bonding.



  • 8.  RE: AP fails to rejoin controller after AP reboot

    Posted Jul 06, 2017 06:29 AM

    Thanks for the reply. Regards the manually configured master and serverip addresses, I assume the AP has them in place from when we ran the conversion process and initially registered them to the master controller? Is it normal that the master and serverip addresses are the same? I'm confident the APs have been rebooted in the past without issue, there have been no significant changes to the site set up of late, I am trying a few different things in an attempt to resolve.

     

    Below is a copy of the show datapath session table <AP-IP> output, it only appears to be the bottom line that increments in small volumes

     

    (MASTER) #show datapath session table <AP-IP-ADDRESS>

    Datapath Session Table Entries
    ------------------------------

    Flags: F - fast age, S - src NAT, N - dest NAT
    D - deny, R - redirect, Y - no syn
    H - high prio, P - set prio, T - set ToS
    C - client, M - mirror, V - VOIP
    Q - Real-Time Quality analysis
    I - Deep inspect, U - Locally destined
    E - Media Deep Inspect, G - media signal
    r - Route Nexthop
    A - Application Firewall Inspect

    Source IP Destination IP Prot SPort DPort Cntr Prio ToS Age Destination TAge Packets Bytes Flags
    --------------- --------------- ---- ----- ----- -------- ---- --- --- ----------- ---- --------- --------- ---------------
    MASTER-IP AP-IP-ADDRESS 17 8211 8211 0/0 0 0 2 0/0/2 28 0 0 FYI
    MASTER-IP AP-IP-ADDRESS 17 8222 8211 0/0 0 0 0 0/0/2 2 0 0 FYI
    AP-IP-ADDRESS MASTER-IP 17 8211 8222 0/0 0 0 0 0/0/2 2 0 0 FYCI
    AP-IP-ADDRESS MASTER-IP 17 8211 8211 0/0 0 0 0 0/0/2 28 12 9128 FCI

     

    Show Ap database just has the AP listed as down, there are no flags mentioned.



  • 9.  RE: AP fails to rejoin controller after AP reboot

    EMPLOYEE
    Posted Jul 06, 2017 06:38 AM

    How many access points are in this situation?  Is it only one?

    I would type "show log system 50" on the master controller to see if there is an issue



  • 10.  RE: AP fails to rejoin controller after AP reboot

    Posted Jul 06, 2017 06:43 AM

    We have 24 APs at this site, I've rebooted 2 and both have shown the same issue. I've left one offline for now in order to troubleshoot, If I perform an AP reset and point it at the master, it will convert and connect to the controller and allow me to assign it to the local site group. It's only subsequent reboots where they don't then come back online, which is making me think if there's an issue in relation to the AP talking to the Master or local Controllers.

     

    Below is the sh log output:

     

    Jul 6 06:11:55 :303086: <ERRS> |AP LDNAP016@<AP-IP-ADDRESS> nanny| Process Manager (nanny) shutting down - AP will reboot!
    Jul 6 06:13:14 :303022: <WARN> |AP LDNAP016@<AP-IP-ADDRESS> nanny| Reboot Reason: AP rebooted Wed Dec 31 16:44:45 PST 1969; SAPD: Unable to contact switch: HELLO-TIMEOUT. Last rebootstrap reason: HELLO-TIMEOUT, 228 sec before: Last Ctrl msg: HELLO len=1504 dest=10.100.100.100 tries=10 seq=0
    Jul 6 06:57:11 :311002: <WARN> |AP LDNAP016@<AP-IP-ADDRESS> sapd| Rebooting: SAPD: Unable to contact switch: HELLO-TIMEOUT. Last rebootstrap reason: HELLO-TIMEOUT, 228 sec before: Last Ctrl msg: HELLO len=1504 dest=10.100.100.100 tries=10 seq=0
    Jul 6 06:57:11 :303086: <ERRS> |AP LDNAP016@<AP-IP-ADDRESS> nanny| Process Manager (nanny) shutting down - AP will reboot!
    Jul 6 06:58:29 :303022: <WARN> |AP LDNAP016@<AP-IP-ADDRESS> nanny| Reboot Reason: AP rebooted Wed Dec 31 16:44:45 PST 1969; SAPD: Unable to contact switch: HELLO-TIMEOUT. Last rebootstrap reason: HELLO-TIMEOUT, 228 sec before: Last Ctrl msg: HELLO len=1504 dest=10.100.100.100 tries=10 seq=0
    Jul 6 07:42:26 :311002: <WARN> |AP LDNAP016@<AP-IP-ADDRESS> sapd| Rebooting: SAPD: Unable to contact switch: HELLO-TIMEOUT. Last rebootstrap reason: HELLO-TIMEOUT, 228 sec before: Last Ctrl msg: HELLO len=1504 dest=10.100.100.100 tries=10 seq=0
    Jul 6 07:42:26 :303086: <ERRS> |AP LDNAP016@<AP-IP-ADDRESS> nanny| Process Manager (nanny) shutting down - AP will reboot!
    Jul 6 07:43:44 :303022: <WARN> |AP LDNAP016@<AP-IP-ADDRESS> nanny| Reboot Reason: AP rebooted Wed Dec 31 16:44:45 PST 1969; SAPD: Unable to contact switch: HELLO-TIMEOUT. Last rebootstrap reason: HELLO-TIMEOUT, 228 sec before: Last Ctrl msg: HELLO len=1504 dest=10.100.100.100 tries=10 seq=0
    Jul 6 08:27:41 :311002: <WARN> |AP LDNAP016@<AP-IP-ADDRESS> sapd| Rebooting: SAPD: Unable to contact switch: HELLO-TIMEOUT. Last rebootstrap reason: HELLO-TIMEOUT, 228 sec before: Last Ctrl msg: HELLO len=1504 dest=10.100.100.100 tries=10 seq=0
    Jul 6 08:27:41 :303086: <ERRS> |AP LDNAP016@<AP-IP-ADDRESS> nanny| Process Manager (nanny) shutting down - AP will reboot!
    Jul 6 08:29:00 :303022: <WARN> |AP LDNAP016@<AP-IP-ADDRESS> nanny| Reboot Reason: AP rebooted Wed Dec 31 16:44:45 PST 1969; SAPD: Unable to contact switch: HELLO-TIMEOUT. Last rebootstrap reason: HELLO-TIMEOUT, 228 sec before: Last Ctrl msg: HELLO len=1504 dest=10.100.100.100 tries=10 seq=0
    Jul 6 08:32:43 :303022: <WARN> |AP LDNAP016@<AP-IP-ADDRESS> nanny| Reboot Reason: AP rebooted caused by cold HW reset(power loss)
    Jul 6 09:16:39 :311002: <WARN> |AP LDNAP016@<AP-IP-ADDRESS> sapd| Rebooting: SAPD: Unable to contact switch: HELLO-TIMEOUT. Last rebootstrap reason: HELLO-TIMEOUT, 228 sec before: Last Ctrl msg: HELLO len=1504 dest=10.100.100.100 tries=10 seq=0
    Jul 6 09:16:39 :303086: <ERRS> |AP LDNAP016@<AP-IP-ADDRESS> nanny| Process Manager (nanny) shutting down - AP will reboot!
    Jul 6 09:17:58 :303022: <WARN> |AP LDNAP016@<AP-IP-ADDRESS> nanny| Reboot Reason: AP rebooted Wed Dec 31 16:44:44 PST 1969; SAPD: Unable to contact switch: HELLO-TIMEOUT. Last rebootstrap reason: HELLO-TIMEOUT, 228 sec before: Last Ctrl msg: HELLO len=1504 dest=10.100.100.100 tries=10 seq=0

     

     



  • 11.  RE: AP fails to rejoin controller after AP reboot

    EMPLOYEE
    Posted Jul 06, 2017 06:56 AM

    Is there a firewall between that site and the controller's?  Is there a wan link or a site to site VPN?



  • 12.  RE: AP fails to rejoin controller after AP reboot

    Posted Jul 06, 2017 07:02 AM

    There is a WAN connection linking the 2 sites utilising MPLS. There are no Firewalls between any of the APs or controllers.



  • 13.  RE: AP fails to rejoin controller after AP reboot

    EMPLOYEE
    Posted Jul 06, 2017 07:23 AM

    In the AP system profile for that AP-Group, I would set the MTU to 1400 and try again.



  • 14.  RE: AP fails to rejoin controller after AP reboot

    Posted Jul 06, 2017 07:28 AM

    I'm assuming thats the SAP MTU value? It's currently blank, but I'm assuming the default is 1500 ?



  • 15.  RE: AP fails to rejoin controller after AP reboot

    Posted Jul 18, 2017 10:54 AM

    Just wanted to close this thread out and advise that we now have resolution. It seems the issue was being caused by the traffic at a remote site passing between 2 Routers, that were connected using dot1q sub-interfaces. I suspect there was some sort of MTU/fragmentation issue occuring - once the traffic was moved to use a single Router, we were able to get the APs registered successfully again.



  • 16.  RE: AP fails to rejoin controller after AP reboot

    Posted Mar 10, 2021 04:27 PM
    Hi all,
    in my case the problem was in the Microsoft DHCP Server. When the DHCP was configured in Active Active microsoft cluster i had this issue. I tried configure the DHCP cluser in Active - Passive and AP now work perfertc.

    ------------------------------
    Ermanno Furlan
    ------------------------------