Wireless Access

last person joined: 21 hours ago 

Access network design for branch, remote, outdoor, and campus locations with HPE Aruba Networking access points and mobility controllers.
Expand all | Collapse all

Stability Issues - Lengthy explanation.

This thread has been viewed 4 times
  • 1.  Stability Issues - Lengthy explanation.

    Posted Jun 03, 2014 12:32 PM

    I am having a hard time explaining this in the subject so hopefully I can get the case across in the body. 

     

    Environement: Campus

    Controllers: 2 - 7210 Controllers running in Master/Local

    Version: 6.3.1.6

    APs: 165 - AP225 

    Clearpass: HW-5K running 6.2.6.62196

     

    SSID1: Corp users using CP as Radius Authenticating against AD

    SSID2: Guest Users using Guest Registration Portal

     

    Total Devices on any given day: 700-1000

     

    PROBLEM:

    The problem is about 90% of the connections work and about 10% of the attempted connections are failing. Totally random on who it is or what device. It happens on both SSIDs either through radius or guest registration. The Guest user connections make the handshake with the SSID and say they are connected but show a 169 local address. At this point you can not find them on the controller when looking up by mac address. The Corp SSID Connections that fail, dont even complete the handshake, it just fails to connect to the SSID. Again, no trace of the mac address on the controller. 

     

     

    When attempting to connect to Guest the users show up in clearpass with this record:

     

    Guest MAC Authentication REJECT

     

    Failed to construct filter=SELECT FLOOR(EXTRACT(EPOCH FROM (NOW() - timestamp)))::integer AS seconds_since_auth, FLOOR((EXTRACT(EPOCH FROM (NOW() - timestamp)))/60)::integer AS minutes_since_auth, FLOOR((EXTRACT(EPOCH FROM (NOW() - timestamp)))/3600)::integer AS hours_since_auth, FLOOR((EXTRACT(EPOCH FROM (NOW() - timestamp)))/86400)::integer AS days_since_auth FROM auth WHERE auth.timestamp < NOW() AND auth.error_code = 0 AND auth.username = '%{Endpoint:Username}' AND auth.mac = '%{Connection:Client-Mac-Address-NoDelim}' ORDER BY timestamp DESC LIMIT 1.
    Failed to get value for attributes=[Days-Since-Auth, Hours-Since-Auth, Minutes-Since-Auth, Seconds-Since-Auth].
    Failed to construct filter=SELECT user_id as guest_device_user FROM tips_guest_users WHERE ((guest_type = 'USER') AND (user_id = '%{Endpoint:Username}') AND (app_name != 'Onboard') AND (enabled = 't') AND ((expire_time is null) OR (expire_time > CURRENT_TIMESTAMP))).
    Failed to get value for attributes=[UserName].
    Failed to construct filter=SELECT FLOOR(EXTRACT(EPOCH FROM (NOW() - timestamp)))::integer AS seconds_since_auth, FLOOR((EXTRACT(EPOCH FROM (NOW() - timestamp)))/60)::integer AS minutes_since_auth, FLOOR((EXTRACT(EPOCH FROM (NOW() - timestamp)))/3600)::integer AS hours_since_auth, FLOOR((EXTRACT(EPOCH FROM (NOW() - timestamp)))/86400)::integer AS days_since_auth FROM auth WHERE auth.timestamp < NOW() AND auth.error_code = 0 AND auth.username = '%{Endpoint:Username}' AND auth.mac = '%{Connection:Client-Mac-Address-NoDelim}' ORDER BY timestamp DESC LIMIT 1.
    Failed to get value for attributes=[Days-Since-Auth]

     

     

    When connecting to the Corp SSID, it fails to connect but never shows up in clearpass or the controller. Hard to troubleshoot when there is no trail left behind on either device.

     

     

     

    DETAILS:

    This is a new Aruba system installed about a 1 1/2 months ago. We have been having this issue and other strange issues since it was installed. We have a TAC case open and they were unable to figure out why. We have replaced the Master Controller with another new Controller, same issue.

     

     

    We have had our config combed over numerous times and do not think the issue is with the configuration. We have been told that it is probably something to do with the Firmware on the controllers we are running. We have fixed this issue over the last month or so by rebooting clearpass and the controllers, or dropping one controller priority back to local and promoting the other to master. Works great for a week then if we simulate a power outage or run an update then restart them, the problem comes back.

     

    Any help would be appreciated. Please let me know if there is any other information i can provide.

     

     

     


    #AP225
    #7210
    #3600


  • 2.  RE: Stability Issues - Lengthy explanation.

    Posted Jun 03, 2014 12:48 PM

    Do you only see this issue with the Guest SSID or with Both ?

     

    Make sure that you guys are not running out of IP addresses in your DHCP scope , what do you guys use for DHCP ?

     

    If you use Mac Caching it is normal to see a reject in access tracker initially if the device hasn't been seen for amount of time specified (session will be valid without registering) in the Enforcement policy to for force the device to re-register through the guest portal



  • 3.  RE: Stability Issues - Lengthy explanation.

    Posted Jun 03, 2014 12:51 PM

    DHCP pool is fine...we are working with our VAR Carousel Industries and they are just scratching their head what this issue is. It's been happen for about a month now. We have been with Aruba TAC also and they can't find anything.



  • 4.  RE: Stability Issues - Lengthy explanation.

    Posted Jun 03, 2014 01:26 PM



    Do you only see this issue with the Guest SSID or with Both ?
    what do you guys use for DHCP ?



  • 5.  RE: Stability Issues - Lengthy explanation.

    Posted Jun 03, 2014 01:35 PM

    we see it in both the Corp and Guest SSID. For guest, the dhcp server is the 7210 controller, for Corp (802.1x auth) the dhcp server is our internal MS domain conntroller. Also, for Corp SSID for mobile device user (BYOD), the dhcp server for that vlan is on the controller

     

    Guest  - vlan 110 (dhcp server is controller)

    Corp - vlan 111 (BYOD Guest for employee mobile devices (dhcp server is the controller)

    Corp - vlan pool 925/926; 802.1x internal corp access (dhcp server is our domain controller)

     


    #7210


  • 6.  RE: Stability Issues - Lengthy explanation.

    Posted Jun 03, 2014 07:05 PM

    I've had similar reports from one of my clients running AP-225's. I wonder if its an 802.11ac / 225 specific issue.

     

    we're in the same boat, can't fault anything specfiic yet and TAC case not going anywhere.

     

    I'll ping my client and see if they have made any progress as it sounds identical, clients getting 169 addresses.

     

    scott



  • 7.  RE: Stability Issues - Lengthy explanation.

    Posted Jun 03, 2014 11:06 PM

    You'll need to enable client debugging to see what's going on when the clients associate.  I know you said it's random, but if there's any chance the issue is reproducable with a known client, you can setup logging for them:

     

    (controller)# conf t logging level debugging user-debug <mac>

     

    After a failed association, issue the following command to see their logs:

     

    (controller)# show logging user-debug all | inc <mac>

     

    You're saying you don't see the 802.1x auth in ClearPass, so it will be interesting to see if the EAPOL-Start from the client is there.  Issue the following to see their auth trail:

     

    (controller)# show auth-tracebuf <mac>

     

    Can you try that?

     

    Also, for what it's worth, I've worked a good amount with 7210's and AP-225's and have come across this.

     

     

    Last, are you doing MAC caching on your guest SSID?  If so, that's why you're seeing that error message and isn't anything to worry about.  ClearPass can't find the MAC with any of those attributes in the Insight Repository.  It's just a failed MAC auth.


    #7210


  • 8.  RE: Stability Issues - Lengthy explanation.

    EMPLOYEE
    Posted Jun 04, 2014 05:02 AM

    At any time, if you feel like you are not getting anywhere with a TAC case, you can ask to have it escalated.  If you are still having problems, that is what I suggest you do.

     

    Any case that is not resolved by TAC, especially intermittent issues, will be much more difficult to solve on this forum, because we do not have access to personal, but crucial information in your network that is needed to resolve your case.  Please escalate the case and report back to us here.