Wireless Access

last person joined: yesterday 

Access network design for branch, remote, outdoor, and campus locations with HPE Aruba Networking access points and mobility controllers.
Expand all | Collapse all

Connection Issue Without a Solution

This thread has been viewed 2 times
  • 1.  Connection Issue Without a Solution

    Posted Sep 29, 2020 09:06 AM

    For the past month I have been dealing with an issue where clients cannot connect to certain APs. 

     

    I am in a school with 136 APs most of them are 225s with 30 new 515s. We have a cluster of two 7210s managed by a pair of virtual Mobility Masters. All of our switches are Aruba either S3500 or newer 2930M. We also use ClearPass for authentication. So everything is Aruba/HPE.

     

    The issue is random across the network. Users will connect without any problem in one room then move to another room and will not connect. It does not affect everyone. In one room I will have 10 people connected and 2 that are not or vice versa. I have walked around the building with 4 devices (an iPad, Windows 10 computer, MacBook Pro, and Android phone). In some rooms they all connect. In some rooms I will have one or two or three connect. 

     

    The logs show there is an issue with the AP and eap-id from the client. What I see is that the user attempts to connect, they can see the network, but then they get rejected. In the MM and on the controllers you can see the client attempt to connect. It draws the logon role then gets dropped. The client never gets to ClearPass. It is dropped right here at the AP or controller.

     

    It doesn't seem to matter which model AP we use, though it seems to be happening more with the 515s. I have noticed in rooms with multiple APs that one AP with the issue can make it harder for clients to connect in the space. Turning off that AP allows users to connect to the working APs. This goes the same for if there is one AP in a single room. I can disconnect it and clients will connect to nearby APs that are working. I have also disconnected an AP for a period of days. After reconnecting it will take some time, get the configuration, and then work. In some cases it will go back to not working after a day or two.

     

    I've had an escalated case with TAC for the past month and they have been very helpful but yet no solution. I have reached out to my sales rep and engineers and I am told to talk to TAC which I have done. Yesterday I did another 2+ hour session with TAC where they took more logs for the engineers to look over. 

     

    A few weeks ago they sent me some new OS updates based on what they found, but that didn't solve the issue. Some APs will work that didn't before and others will fail, sometimes in places where we didn't have an issue before.

     

    What's strange is that I can look at the MM and see over 800 clients attached. The number of users affected is quite small, yet there are enough that it has become a major concern. One that appears to have no solution. This is something that we can't live with.

     

    I am wondering if anyone else had seen something like this and might have come across a solution.

     

    Thanks



  • 2.  RE: Connection Issue Without a Solution

    EMPLOYEE
    Posted Sep 29, 2020 02:51 PM

    Please PM me your case#



  • 3.  RE: Connection Issue Without a Solution

    Posted Sep 29, 2020 02:55 PM

    MMMM...

    Your MD's cluster is L2 or L3?

     

    Please run show lc-cluster vlan probe. ( CLI on one of the controllers )

    It sounds like L3 cluster.

    What are u using for auth ? Make sure both MD's using their VRRP address has nas-ip.

     

     



  • 4.  RE: Connection Issue Without a Solution

    Posted Sep 29, 2020 02:56 PM
    Make sure in your cluster profile to exclude VLAN's your MD's cant see each other - and than Be sure your cluster is L2 - and your issue will be solved ( at least to me , Its sounds like L3 Cluster issues. )


  • 5.  RE: Connection Issue Without a Solution

    Posted Sep 29, 2020 03:08 PM

    It says L2 Conn when I run the command.

    We use ClearPass for auth.

    As for this: 

    Make sure both MD's using their VRRP address has nas-ip.

    I'm not sure where I would check that.

     

    Thanks for responding.



  • 6.  RE: Connection Issue Without a Solution

    Posted Oct 05, 2020 12:43 PM

    I had the TAC engineer take a look at the solution posted on here and he said everything is set up properly with L2 and that isn't the issue.

     

    The problem persists and is growing. More locations on our network will not allow users to connect. The network is slowly becoming useless and I don't have any solution from anywhere. I really am unsure what I can do going forward. We have invested a lot of money into Aruba/HPE to get a working network and it feels like money flushed down the drain right now.



  • 7.  RE: Connection Issue Without a Solution

    Posted Oct 14, 2020 09:00 AM

    After another marathon session with TAC and a group of engineers here is what they discovered as to what maybe the problem:

     

    Root cause: The DHCP takes 47s, but Aruba code DHCP timeout is 40s, this causes the GRE tunnel src IP to be selected as the Default one (192.168.x.x). A new debug image with increased DHCP timeout was loaded to one AP 

    The AP was able to build GRE tunnel with its DHCP assigned IP , Clients were able to connect 

    Further the AP would be deployed in an area which has reported the issues all the time

    Logs were collected which will be analyzed by the team 

    We need to know why the DHCP was taking this much time, is the delay in the intermediate Devices/Relay agent/Network 

    Also, to note , apart from the Controller image upgrade there was no change in the Network , hence DEV will investigate the code as well 

     

    So it looks like we are closer to a solution but not quite there yet.



  • 8.  RE: Connection Issue Without a Solution

    Posted Oct 22, 2020 08:09 AM

    Still no solution yet.

    TAC had said to look at the Windows DHCP server to see if there is a way to extend the delay for DHCP to the APs. No luck there. Windows doesn't give you many options to do that. The only one we found was to extend the delay by 1000 ms. That didn't do anything.