Clients dropping while roaming and during normal usage
10-05-2017 09:24 AM - edited 10-05-2017 09:25 AM
Most recently we just upgrading our 7220's to the latest 6.5 (220.127.116.11) code base. For the past week, we have seen an abnormal amount of HelpDesk requests, reporting wireless issues. At first, I was trying to determine if this issue was isolated to a location/AP/User-role/Subnet/VLAN/Client Device Type/Radius Server. After several hours of debugging, the issue became more widespread. Happening to any device type, any role, any location...etc. I know we have RF coverage gaps in some locations, but not to this scale. For the most part, our average client SNR is good, Channels are very good..etc. We have roughtly 400 AP-205's spread acrossed 7 schools. Our 7220 controllers are configured in a master/local HA. Our device usage is roughtly:
500 Lenovo Twists Win 8.1
5000 BYOD Devices (mostly ios)
Our average connected client count is around 3500.
We have 2 SSID's on all VAPs:
*Chrome Network for our 1:1 Chrome model (mac auth via Clearpass)
*Standard Network for everything else (WPA2-AES, Explict Mode EAP-PEAP auth)
We have both 2.4/5 enabled on both networks (we can prob kill 2.4 on Chrome network)
For a majority of our RF profiles (We used Aruba best practices guide),
A radio EIRP is:
G radio EIRP is:
I know we desperately need an updated wireless survey as our density continues to grow. We have on a average roughtly 30 clients connected to each AP (most on 5GHz).
The problem, which I was able to reproduce personally, mosly occurs when a device closed its lid, walks to a new classroom, opens lid, gets limited connetion. At this time, the client is in the User-Table, with correct User-Role, VLAN, IP Settings, associated to the nearest AP with good SNR...ETC. However, during this period on limited connection, the client can't echo ICMP from the controller, can't surf, can't ping gateway..etc. Looking at the datapath session table, we don't see any blocked traffic, no syns etc. The issue sometime takes upto 20 minutes to correct itself. Kicking client from User-Table has no positve effect on issue. Looking at the ap client trail-history, you can see the roaming moves, sometimes we see issues with encryption not supported...
We have a active case with TAC, at the highest level of support. They think its a possible bug in the code. I just wanted to see if anyone else had simular issues. If theres a possible tweak to help mitigate these horrible issues. Most likely I will update to 8.1 instead of waiting for a possible patch.
Please let me know if anyone would like specific logs as I was debugging while reproducing issues, although TAC said everything looked normal with config.
Thanks for reading/support.