For the past month I have been dealing with an issue where clients cannot connect to certain APs.
I am in a school with 136 APs most of them are 225s with 30 new 515s. We have a cluster of two 7210s managed by a pair of virtual Mobility Masters. All of our switches are Aruba either S3500 or newer 2930M. We also use ClearPass for authentication. So everything is Aruba/HPE.
The issue is random across the network. Users will connect without any problem in one room then move to another room and will not connect. It does not affect everyone. In one room I will have 10 people connected and 2 that are not or vice versa. I have walked around the building with 4 devices (an iPad, Windows 10 computer, MacBook Pro, and Android phone). In some rooms they all connect. In some rooms I will have one or two or three connect.
The logs show there is an issue with the AP and eap-id from the client. What I see is that the user attempts to connect, they can see the network, but then they get rejected. In the MM and on the controllers you can see the client attempt to connect. It draws the logon role then gets dropped. The client never gets to ClearPass. It is dropped right here at the AP or controller.
It doesn't seem to matter which model AP we use, though it seems to be happening more with the 515s. I have noticed in rooms with multiple APs that one AP with the issue can make it harder for clients to connect in the space. Turning off that AP allows users to connect to the working APs. This goes the same for if there is one AP in a single room. I can disconnect it and clients will connect to nearby APs that are working. I have also disconnected an AP for a period of days. After reconnecting it will take some time, get the configuration, and then work. In some cases it will go back to not working after a day or two.
I've had an escalated case with TAC for the past month and they have been very helpful but yet no solution. I have reached out to my sales rep and engineers and I am told to talk to TAC which I have done. Yesterday I did another 2+ hour session with TAC where they took more logs for the engineers to look over.
A few weeks ago they sent me some new OS updates based on what they found, but that didn't solve the issue. Some APs will work that didn't before and others will fail, sometimes in places where we didn't have an issue before.
What's strange is that I can look at the MM and see over 800 clients attached. The number of users affected is quite small, yet there are enough that it has become a major concern. One that appears to have no solution. This is something that we can't live with.
I am wondering if anyone else had seen something like this and might have come across a solution.
Thanks