Reply

Re: Band Steering behavior

What version of OAC are you using? OAC 4.56 and previous has a bug where the PMKID only accounts for the original BSSID, which means you're roaming domain is only 1 AP. OAC 4.57 and newer resolves this issue. If you are seeing PMKID error messages in the OAC client, at least for us, that has been our issue. So in band-steering, even going from 2.4Ghz to 5GHz on the same SSID would force a completely new re-atuh as it would be a new BSSID. in 4.57 and newer, the PMKID is more broad and is matched on the ESSID and a number of other things.

EDIT: Saw you mention you were using 4.72, but as you may have found out, going to 5.x requires a new key for OAC. Go here (https://www.juniper.net/customers/support/products/aaa_802/oac_client_user.jsp) and you can grab 4.80 which for sure includes the fix for the PMKID issue and you should be able to do a straight upgrade. 4.72 commercial (I believe) still had the bug (memory is foggy), I was thinking of our fed version (CC). 4.80 for sure has the PMKID fix, anything 5.x does, and if so (and based on your user that upgraded working all day), that may be your issue.

If you need me to dig up any old old emails between Juniper and ourselves for referencing on JTAC numbers, let me know and I can try to find them (they're about 1.5-2 years old by now, gonna have to dust off the file cabinet heh)

Jerrod Howard
Distinguished Technologist, TME
Aruba Employee

Re: Band Steering behavior

Hi Jerrod - Thanks for the reply. Yes, we were running 4.72, but I bumped that one user up to 5.1 (no issue getting the license key for that). We can definitely see that his client is wanting to roam, and in the Juniper debugs it's seen that he can keep connecting to the same BSSID or connect to a different one, some sessions stay live because the PMKID was found in cache, some require a complete reauth because a matching one was not found. However, when I walk around my building, roaming from AP to AP, I never have the issue.

Only when I put an AP practically on top of the user was he able to stay connected all day. I didn't get any debug data from him yesterday, but I'm theorizing that his client didn't want to roam because the signal strength was so good, so we didn't hit the issue. So, I'm pretty much down to roaming triggering the problem and most likely PMKID caching issue. Who has the PMKID issue, Aruba or Juniper, is the question and a bit of finger pointing is starting.

I'd really appreciate any info you could provide on the bug you mentioned. I'd love to go back to Juniper with that information.

Re: Band Steering behavior

Do you have only one controller terminating the APs or multiple locals? The PMKID is only resident to the local controller (so I was told) so if you have APs in roughly the same area on different controllers, the PMKID may not be valid when you roam from one local's Ap to another.

I will see what I can dig out from the old days on this issue. The true way to tell if it's OUR problem or theirs, unfortunately, is to enable the OAC debug and capture the PMKIDs on each roam. If the PMKID doesn't change from roam to roam to roam, but the log shows the OAC client rejecting a known good PMKID, then there is your answer. If the PMKID is changing on each roam (in the OAC log), then it's ours (unless you have multiple locals).

EDIT: Let me know if you have the registry file to enable the OAC debug. I can't remember if you even need it 5.1 or if you can now enable it in the GUI of the OAC. In 4.72 you definitely need the registry files. If you need them, email me and I will send them your way. Later on today I can get to a wireless PC and get a sample PMKID exchange and see if you enable debug in the OAC 5.1)

Jerrod Howard
Distinguished Technologist, TME
Aruba Employee

Re: Band Steering behavior

Shot in the dark, but you might also try enabling "validate pmk" in the configuration. Are you running Opp Key Caching? Regardless, try enabling validate pmk in the 802.1x authentication profile.
Aruba Employee

Re: Band Steering behavior

Brian - Not a shot in the dark at all. My SE, Austin, looked at some logs from the trouble user and it shows an EAPOL Start with Key1 was sent to the client after a roam and no response from the client, a minute later, it tried again, no response, no then full reauth. I'll be turning "validate-pkmid" on tonight and testing tomorrow. They only thing is, if OAC truely does support OKC, I wouldn't expect any difference. I'm almost positive that OAC does support it, so I wonder if something else is screwy here.

Jerrod - Thanks for the info. No, there's only one controller per site, so no issues with PMKIDs being shared across controllers, but that really only applies for OKC. I've got debug levels up to 5 and capturing everything for JTAC. Thing is though, you can't see the actual keys in the debugs. I wonder if that's something you were able to see in older versions???

Re: Band Steering behavior

Hey Mike, attached are my logs. You're right the PMKID isn't visible, but you can log the hash. See below. I fired up my PC, enabled logging and connected. Then I disabled and re-enabled my NIC to go back to same AP (single-domain roam to same BSSID). Then I walked to other side of the house for a roam to a new AP (multi-domain PMKID). Each time you see a list of valid PMKID Candidates and the hash of the PMKID and it never changes when I roam from AP to AP. Then I reboot the PC and I captured the entire new PMKID generation and you see I have a new PMKID hash but with the same PMKID candidate.

###########################
Bah, tried to post, but was too long...just open the 'to-post.txt' file to see what I'm looking for.

PMKID Candidate and hash looks like this:
re-assoc to 64:00
16:30:00.093 0 odClientService.exe OdysseySupplicantMgr.cpp:291 Setting PMKID Map, 4 items
16:30:00.093 4 odClientService.exe <>:0 0000 00 1a 1e 23 64 00 5f 27:08 52 cb 0e fa 20 83 c1 ...#d._'.R... ..
16:30:00.093 4 odClientService.exe <>:0 0010 86 f5 e5 42 9c e3 00 0b:86 26 ca d0 fb 61 b4 0a ...B.....&...a..
16:30:00.093 4 odClientService.exe <>:0 0020 12 c0 3f f6 bf 74 9e 95:0c 05 2d 18 00 0b 86 c8 ..?..t....-.....
16:30:00.093 4 odClientService.exe <>:0 0030 b6 40 c9 af e7 d8 39 7f:d9 3b 19 d1 79 02 71 10 .@....9..;..y.q.
16:30:00.093 4 odClientService.exe <>:0 0040 ec d9 00 0b 86 c8 b6 50:41 d1 f8 21 c4 3b 2c 57 .......PA..!.;,W
16:30:00.093 4 odClientService.exe <>:0 0050 82 1a 23 63 3c f7 00 93: ..#c<...


Roam to B6:40
16:34:45.453 0 odClientService.exe OdysseySupplicantMgr.cpp:291 PMKID Candidate: 000B86C8B640, flags = 0
16:34:45.453 0 odClientService.exe OdysseySupplicantMgr.cpp:291 PMKID Candidate: 001A1E236400, flags = 0
16:34:45.453 0 odClientService.exe OdysseySupplicantMgr.cpp:291 PMKID Candidate: 000B8626CAD0, flags = 0
16:34:45.453 0 odClientService.exe OdysseySupplicantMgr.cpp:291 PMKID Candidate: 000B86C8B650, flags = 0
16:34:45.453 0 odClientService.exe OdysseySupplicantMgr.cpp:291 Setting PMKID Map, 4 items
16:34:45.453 4 odClientService.exe <>:0 0000 00 0b 86 c8 b6 40 c9 af:e7 d8 39 7f d9 3b 19 d1 .....@....9..;..
16:34:45.453 4 odClientService.exe <>:0 0010 79 02 71 10 ec d9 00 1a:1e 23 64 00 5f 27 08 52 y.q......#d._'.R
16:34:45.453 4 odClientService.exe <>:0 0020 cb 0e fa 20 83 c1 86 f5:e5 42 9c e3 00 0b 86 26 ... .....B.....&
16:34:45.453 4 odClientService.exe <>:0 0030 ca d0 fb 61 b4 0a 12 c0:3f f6 bf 74 9e 95 0c 05 ...a....?..t....
16:34:45.453 4 odClientService.exe <>:0 0040 2d 18 00 0b 86 c8 b6 50:41 d1 f8 21 c4 3b 2c 57 -......PA..!.;,W
16:34:45.453 4 odClientService.exe <>:0 0050 82 1a 23 63 3c f7 00 93: ..#c<...

You will also see in the 'before-reboot' file that ALL the PMKID hashes are identical, even when they come from different PMKID candidates. Same for the 'after-reboot' file.

###########################

If OAC rejects a valid PMKID, you will see it referenced in the logs. If we send a different PMKID, you will see it in the hash. Even more interesting, you can see OAC flags the PMKID as opportunistic and assembling the PMKID hash. Look in the 'after-reboot' log file between lines 335-345 for the PMKSA and eventual PMKID gen between lines 495 to 502. From there on out, the PMKID is the same on every roam.

This is the kind of stuff I would look for when it fails. It's a PITA to tease out though as these logs are very verbose and cut off after 8000 lines.

Jerrod Howard
Distinguished Technologist, TME
Aruba Employee

Re: Band Steering behavior

Wow, thanks so much Jerrod. I see the PMKIDs and the patters to look for. Now it's time to dig through a boat-load of logs!
Aruba Employee

Re: Band Steering behavior

Ok, so I followed some debugs and this is basically what I see (I'll try to get something together for everyone to look at tomorrow). But, I went backwards from a PMKID failure to see what hash is was rejecting, then went to the beginning to see which BSSID was associated with that key, I'll call this KEY-A. For a little bit, everything is good, the candidate list is consistant, same bssids, same hashes.

Then OAC goes off into the weeds, says PMKID candidates are "0." After a bit a new PMKID list is made, same APs, new hashes, I'll call the new key from the BSSID in question, KEY-B. Later on I catch up to the PMKID rejection, and it appears the BSSID in question sent KEY-A to the client, not KEY-B, hence rejected.

Maybe I'm missing something, but that's what I see. Tomorrow I'm going to try WZC as the client to see if the client experiences anything different.

Re: Band Steering behavior

If you can, send me the logs from the OAC (jhoward at arubanetworks dot com) and I can see if I notice anything. I know what to look for on these kinds of things, but I need to read up better to determine the relationship of how the controllers store and keep the PMKID and also how the supplicant stores and interprets the PMKIDs between the two.

At least were getting somewhere, but usually when some work and some don't, it's more supplicant related (in my experience, though there could be some kind of weird goofiness going on between both the supplicant and wlan). WZC also supports OKC. There's this (http://support.microsoft.com/kb/328601) but I don't know if it gives you the same level of debug info to troubleshoot PMKID issues. Will have to test tomorrow.

Thanks Mike!

Jerrod Howard
Distinguished Technologist, TME
Highlighted
Contributor II

band steering

ok, back on topic

we're up to 3.3.2.21

I'm seeing two issues primarily
1 - band steering does not seem to be working during high usage times (lots of clients on)

2 - mac users seem to be connected (bars) but traffic is an issue (slow page loads, long delays on videos and such, like they are getting disconnected)

here are my questions,

1 - what version of the OS does aruba / alcatel recommend for the best possible arm2.0 and band steering performance (and mac compatibility)
2 - should we even be using band steering at this point, is it ready for production?
3 - anyone at aruba want to take a look at my config and see if something looks wrong?

We've got a lot of frustrated students and we're very eager for a solution,

Thanks!
Search Airheads
cancel
Showing results for 
Search instead for 
Did you mean: