02-04-2013 09:29 AM - edited 02-04-2013 09:33 AM
From time to time I have a few clients that can't connect to my wireless network. Out of 150 clients per day, I woud say that I hear of five or siz people per week that have problems connecting. All have up-to-date drivers and the types of users and laptops vary. If I could pinpoint anything, I'd say it tends to happen more when a user goes from being docked on the wired network to being on wireless as they move to a meeting room.
It is proving very difficult to nail down, but I believe it has something to do with how the Aruba controllers is talking to my Windows RADIUS servers. On one particular occasion, I had a client which wouldn't connect, even without a reboot. So I switched the RADIUS server order around in my RADIUS server group, and the client connected right away.
I setup client debugging and what I see is that the client associates, but then times out and deauths itself. A sh auth-tracebuf shows the EAP exchange start, but after the controller sends an EAP-REQ, it receives no response from the client.
I noticed the following statistics on the controller in regards to RADIUS , and it has high timeouts as well as showing the RADIUS. servers up for only a short amount of time.
One of the RADIUS servers is on the same LAN as the controller, the other is in the US accessed via a WAN link.
I've had the Windows guys log everything, but they can't seem to find anything wrong.
Here's some output, I would appreciate any help with trying to work this out. We are running AOS220.127.116.11
RADIUS Server Statistics
Statistics ukradius0 usradius3
---------- -------- --------
Accounting Requests 12621 16170
Raw Requests 203134 233577
PAP Requests 0 0
CHAP Requests 0 0
MS-CHAP Requests 0 0
MS-CHAPv2 Requests 0 0
Mismatch Response 7 2
Bad Authenticator 0 0
Access-Accept 12837 13920
Access-Reject 798 811
Accounting-Response 12621 16170
Access-Challenge 188658 217231
Unknown Response code 0 0
Timeouts 3378 6505
AvgRespTime (ms) 11 115
Total Requests 215755 249747
Total Responses 214921 248134
Uptime (d:h:m) 0:0:5 0:1:39
SEQ Total/Free 255/255 255/255
02-04-2013 11:21 AM
What does your aaa authentication dot1x profile look like? We successfully used Microsoft NPS before we moved to ClearPass Policy Manager as our RADIUS server.
Aruba has not chosen reasonable default values in the dot1x authentication profile.
Here is our current dot1x authentication profile.
aaa authentication dot1x <insert your name here>
machine-authentication machine-default-role "denyall-role"
machine-authentication user-default-role "denyall-role"
timer idrequest_period 5
timer quiet-period 3
server server-retry-period 5
server server-retry 3
timer wpa-key-period 2000
timer wpa2-key-delay 100
timer wpa-groupkey-delay 100
02-04-2013 02:18 PM - edited 02-05-2013 02:02 AM
Here is our current profile, it looks a bit different to yours, any particular settings that you'd recommend we change ?
802.1X Authentication Profile "global-802.1x-default-auth-profile"
Max authentication failures 0
Enforce Machine Authentication Disabled
Machine Authentication: Default Machine Role guest
Machine Authentication Cache Timeout 24 hr(s)
Blacklist on Machine Authentication Failure Disabled
Machine Authentication: Default User Role guest
Interval between Identity Requests 30 sec
Quiet Period after Failed Authentication 30 sec
Reauthentication Interval 86400 sec
Use Server provided Reauthentication Interval Disabled
Multicast Key Rotation Time Interval 1800 sec
Unicast Key Rotation Time Interval 900 sec
Authentication Server Retry Interval 30 sec
Authentication Server Retry Count 2
Number of times ID-Requests are retried 3
Maximum Number of Reauthentication Attempts 3
Maximum number of times Held State can be bypassed 0
Dynamic WEP Key Message Retry Count 1
Dynamic WEP Key Size 128 bits
Interval between WPA/WPA2 Key Messages 1000 msec
Delay between EAP-Success and WPA2 Unicast Key Exchange 0 msec
Delay between WPA/WPA2 Unicast Key and Group Key Exchange 0 msec
Time interval after which the PMKSA will be deleted 8 hr(s)
WPA/WPA2 Key Message Retry Count 3
Multicast Key Rotation Disabled
Unicast Key Rotation Disabled
Opportunistic Key Caching Enabled
Validate PMKID Disabled
Use Session Key Disabled
Use Static Key Disabled
Termination EAP-Type N/A
Termination Inner EAP-Type N/A
Token Caching Period 24 hr(s)
TLS Guest Access Disabled
TLS Guest Role guest
Ignore EAPOL-START after authentication Disabled
Ignore EAP ID during negotiation. Disabled
Disable rekey and reauthentication for clients on call Disabled
Check certificate common name against AAA server Enabled
02-06-2013 07:41 PM
I just wanted to add that my NPS servers show an inaccurate uptime as well. My servers have been up since the weekend, but my controller shows their uptime is around a few hours. I have some ClearPass servers, and an ACS server, and their uptime appears to be more accurate. Wondering if this is a bug.
If a reply adequately addresses your issue, please click on the "Accept as Solution" and "Give Kudos" button so this information can benefit other users.
02-11-2013 12:29 AM
I saw something similar - if there's many Radius timeouts, do a Wireshark between controller and WAN Radius and look for "ICMP destination unreachable - fragmentation needed"...that would be symptom.
02-11-2013 02:57 PM
Since making the changes last week, my uptime numbers are improved, but I still show the RADIUS servers as being up for only a few hours before they get marked down again. It's not a fragmentation issue as the controller and one of the RADIUS servers is on the same VLAN.
I haven't heard of any users being unable to connect since I made the change but it's not unusual for them not to report this.
Another point of note, we also have Cisco autonomous AP's here, which use the same RADIUS servers (we have around 50 RADIUS servers globally) and I checked a few of them today, and they also seem to show a short uptime of a few hours.
So it seems to me it's not just Aruba related, akthough I'm not aware of any users having issues connecting to our Cisco AP's.
Anyone got any ideas as to how to troubleshoot this further ?