07-16-2012 01:38 PM
My company experienced a power outtage at three of our remote locations and three of my 135 AP's went down. Since then, thye have all been down. I have had this issue before with one or two AP's but they would usually find the controller and then they are back online. This time, they are just sitting there and have been in this state now for more than 2 days. DHCP is running fine at this location because I have other devices accessing from here and there are more than enough available IP addresses. This is not happening from one subnet but from three different subnets.PoE injectors being used have been tested on other devices and working fine.
I know resetting the AP settings physically should work, but I would have to re-provision the AP.
Any reason why this would be happening?
07-16-2012 02:27 PM
How are these APs provisoned to find master controller? Static, DHCP, DNS?
Can you find out the IP-address of the AP from DHCP server and try to PING it?
also check on the controller "show datapath session table | include <ip of AP>" and find out whether AP is trying to contact the controller or not. You should see session on UDP 8211 (PAPI) and also Protocol 47 which is for GRE.
Console of the AP will also give you more details but I am not sure whether that will be possible or not.
07-16-2012 05:12 PM
Alap, thanks for the response.
All of my AP's are provisioned using DHCP..
I have found the ip address from the DHCP server and pinged it. Apparently another device has pulled that address already.
Here is the output of the SHOW command you sent:
#show datapath session table | include 10.0.10.54
10.0.10.54 10.0.17.18 17 8211 8211 0/0 0 0 1 local d FY
10.0.10.54 10.0.17.20 17 8211 8211 0/0 0 0 1 1/0 d FC
10.0.10.54 10.0.17.20 17 21739 2 0/0 0 0 1 1/0 3 FC
10.0.17.20 10.0.10.54 47 0 0 0/0 0 0 0 1/0 c9c2 F
10.0.17.20 10.0.10.54 17 8211 8211 0/0 0 0 1 1/0 d FY
10.0.17.18 10.0.10.54 17 8211 8211 0/0 0 0 1 local d FC
10.0.10.54 10.0.17.20 47 0 0 0/0 0 0 0 1/0
10.0.17.20 would be the controller and 10.0.10.54 the AP.
That location is remote and would involve some travel to console into it............but it is possible and will have to be done if I cannot resolve this matter remotely within the next 24hrs.
07-17-2012 06:36 PM
I would suggest to try to do DHCP reservation for those AP's and after do a POE reboot.
Try doing a packet capture on the ether of the AP: if using linux : tcpdump -i <interface> -n ether host <mac>
look there the BOOTP traffic and maybe the AP trying to connect to the controller and check if there are any rejects from the controller ... or at least if there are any replies from it.
07-18-2012 08:58 PM
Just a point (or two ;)) of clarification...
What is the IP address of the controller? I ask as I see .18 and .20 'talking' to .54 that makes me believe that .54 is the controller, rather than .20.
IN any event, there is some communication from AP to controller... question is, is this capture for a working or non-working AP of course...
What is the output of show ap database for the APs that are missing... are they all 'down' or in some other state ?
07-20-2012 03:44 AM
Thanks for your response.
I went ahead and did a reservation for all of my Ap's in that location. Once I rebooted my core router and my dhcp server, the AP's came back online. After that I got a call from that location that someone there was receiving an IP address conflict. Came to find out that someone there had used that same static IP in their machine which may have initially caused the issue to present itself.
07-21-2012 08:03 PM - edited 07-22-2012 11:20 AM
To prevent this from happening I would suggest a VLAN that handles the DHCP (use it as a management VLAN) for the devices and another VLAN for clients that will assign DHCP for them.
We had to do this to some location that we have because we found that more and more Apple products that are arp-ing all the IP's on the subnet and sending to the server that the MAC for that IP is his own this way messing the ARP entries (something like what you do for a men in the middle attack) - No explanation till today on why this happens though.