Wireless Access

last person joined: yesterday 

Access network design for branch, remote, outdoor, and campus locations with HPE Aruba Networking access points and mobility controllers.
Expand all | Collapse all

Cannot Roam?!

This thread has been viewed 23 times
  • 1.  Cannot Roam?!

    Posted Sep 12, 2018 09:03 PM

    Hi everyone!

     

    I could use a bit of help here ...

     

    Our users have been reporting frequent disconnections, which seem random, for a long time now.  Unfortunately we have never been able to get a reliable reproduce scenario.

     

    Today, amazingly, I found one and can reproduce their exact issue on demand every time.  I've tracked it down to a failure to roam and it looks like the failure may be coming from the Aruba side.   I've taken the two involved access points out of service and set them up to lab this scenario.  Unfortunately this creates a minor outage but thankfully the users are tolerating it in hopes of finally fixing this issue.

     

    Our controller is an Aruba7030 running 8.3.0.0.  All access points through the entire enterprise, including the two affected units, are 205's.

     

    The client test devices are an iPad and an iPhone (Others client devices may be affected as well - we just don't know - these were chosen as they represent our user base and are also the most modern available from this major vendor - Apple.. so they simply must be made to work).  The iPad is an iPad Pro and the iPhone is an iPhone X.  Both run the latest iOS version.

     

    Our controller is an Aruba7030 running 8.3.0.0.  All access points through the entire enterprise, including the two affected units, are 205's.

     

     

    To reproduce the failure, all I need to do is walk from one office to the adjacent office.  The offices are separated by a type of wall which the RF does not penetrate (I dont know why) and so each area has its own AP.  This was determined during the initial install by the RF site survey.  

     

    When walking from one office to the other I will see the wifi signal indicator on the iPad (iPhone) drop markedly and any network activity that may have been going on usually slows or stops.  Next I see the wifi indicator jump up to full strength (presumably indicating that the better AP has been seen and it has, or is, doing a roam).  After a few moments the wifi indicator will disappear and the device will raise a popup informing me that my cellular data is turned off and I should go into settings to enable it. (.. it is turned off specifically for this test).

     

    After this event, returning to my console to view syslog data output resulting from a "logging level debugging user-debug <client-mac>" setting I find the following (below).   Notably I see what appears to be my client attempting to associate with the better (the "roam to") AP, herein called "TestAP2" and then seemingly being rejected by an  "deauth_reason 30" from the Aruba system.  Seemingly the client device accepts this and makes no more attempts.

     

    Looking up deauth code 30 I see that it has something to do with the client not being permitted, or permitted to use some service.  This is all a bit nebulous.

     

    On separate tests, I can associate initially to this TestAP2 just fine, and then walk to the first office ("TestAP1") and the same pattern happens.. just in reverse.

     

    After spending several hours on this today I have made zero progress other than to isolate this item.  Could this be a bug?

     

    Can anyone suggest anything I can try?

     

    Thanks!

    -J

     

    -- log excerpt --

     

    Sep 12 17:41:43 2018 Aruba7030 authmgr[4509]: <522260> <5390> <DBUG> <Aruba7030 192.168.0.6>  "VDR - Cur VLAN updated 78:7b:8a:a4:a7:ac mob 0 inform 1 remote 1 wired 0 defvlan 1 exportedvlan 0 curvlan 1.

    Sep 12 17:41:43 2018 Aruba7030 authmgr[4509]: <522301> <5390> <DBUG> <Aruba7030 192.168.0.6>  Auth GSM : USER publish for uuid 000b86b4e4e70000000e0109 mac 78:7b:8a:a4:a7:ac name  role authenticated devtype  wired 0 authtype 0 subtype 0  encrypt-type 9 conn-port 0 fwd-mode 1 roam 0 repkey -1

    Sep 12 17:41:43 2018 Aruba7030 authmgr[4509]: <522287> <5390> <DBUG> <Aruba7030 192.168.0.6>  Auth GSM : MAC_USER publish for mac 78:7b:8a:a4:a7:ac bssid ac:a3:1e:a5:3e:11 vlan 1 type 1 data-ready 0 HA-IP n.a

    Sep 12 17:41:43 2018 Aruba7030 authmgr[4509]: <522096> <5390> <DBUG> <Aruba7030 192.168.0.6>  78:7b:8a:a4:a7:ac: Sending STM new Role ACL : 79, and Vlan info: 1, action : 10, AP IP: 192.168.0.136, flags : 0 idle-timeout: 300

    Sep 12 17:41:43 2018 Aruba7030 authmgr[4509]: <522308> <5390> <DBUG> <Aruba7030 192.168.0.6>  Device Type index derivation for 78:7b:8a:a4:a7:ac : dhcp (0,0,0) oui (0,0) ua (21,5,35) derived iPad(5):iOS

    Sep 12 17:41:43 2018 Aruba7030 authmgr[4509]: <522242> <5390> <DBUG> <Aruba7030 192.168.0.6>  MAC=78:7b:8a:a4:a7:ac Station Created Update MMS: BSSID=ac:a3:1e:a5:3e:11 ESSID=TestESSID VLAN=1 AP-name=TestAP2

    Sep 12 17:41:43 2018 192.168.0.134 stm[1257]:  <501000> <DBUG> |TestAP2@192.168.0.134 stm|  Station 78:7b:8a:a4:a7:ac: Clearing state

    Sep 12 17:41:52 2018 Aruba7030 authmgr[4509]: <522296> <5390> <DBUG> <Aruba7030 192.168.0.6>  Auth GSM : USER_STA delete event for user 78:7b:8a:a4:a7:ac age 0 deauth_reason 30

    Sep 12 17:41:52 2018 Aruba7030 stm[4526]: <501000> <4526> <DBUG> <Aruba7030 192.168.0.6>  Station 78:7b:8a:a4:a7:ac: Clearing state

    Sep 12 17:41:52 2018 Aruba7030 authmgr[4509]: <522152> <5390> <DBUG> <Aruba7030 192.168.0.6>  station free: bssid=ac:a3:1e:a5:3e:11, mac=78:7b:8a:a4:a7:ac.



  • 2.  RE: Cannot Roam?!

    EMPLOYEE
    Posted Sep 12, 2018 10:03 PM

    It could be something simple and the deauth reason could be masking it.  (deauth reason30 does not match anything specifically).

     

    - What is the transmit power of the two access points in question?

    - How far apart are the access points?

    - How high are they mounted?

     



  • 3.  RE: Cannot Roam?!

    Posted Sep 13, 2018 12:09 AM

    The power level on 5ghz is set to 12 to 18.  The 2.4ghz radio is disabled to simplify this testing.

     

    Between them is what was once a demising wall and each AP cannot reliably be RF-seen from the opposite side.  Technically they can be seen but the signal is so weak these iOS devices don't reliably pick it up and when they do, they cant associate.  The reason for the second AP's existance is that the first cannot penetrate this wall successfully.  The distance, as the crow flies, is about 20 feet.

     

    Both units are about 25 feet above the finished floor.  This is a single storey building.

     

    -J



  • 4.  RE: Cannot Roam?!

    EMPLOYEE
    Posted Sep 13, 2018 06:10 AM

    If the signal is too weak, I don't know if roaming will be smooth, as a result.  The client really determines when to roam, and what we can control is the transmit power of the access point to influence when the client tries to roam.  If a client sees an access point as very strong and another one as weak, the roam will be poor, unless you try to make the signal seem more "even" in the middle of the roam.

     

    If the client sees the signal as poor, the roam will be poor, so you will need to increase the transmit power.

     

     



  • 5.  RE: Cannot Roam?!

    Posted Sep 13, 2018 12:31 PM

    Indeed.  What I see from the clients UI and the Aruba debug syslog is that the client has made the determination to roam but the roam-to AP (the AP it wants to move to) rejects it with the 30 cause reason.

     

    All of the syslog messages I posted in the earlier message are from the roam-to AP.  In those messages we can see that the test client (78:7b:8a:a4:a7:ac) has made contact with the roam-to AP (bssid ac:a3:1e:a5:3e:11).  Authenticates, the AP seems to correctly fingerprint the client as an iPad, and then... we get a deauth with a reason code of 30.  So clearly the client has decided to roam -- its not hanging on to the old AP with a low signal strength.

     

    Given that the client has clearly already decided to roam (... which is why we see these log messages about the roam-to AP).. what else could cause this?  Particulartly what could cause the de-auth?  What are typical reasons for reason code 30?  I'd imagine, being one of the most popular devices in the world, that the iPad and iPhone are well characterized -- is there anything unique or telling about them, specifically, that might point in a direction?  And/or how might one troubleshoot (.. determine who is at fault) for the de-auth - is there a way to monitor the management traffic on the air to see who is requesting the deauth, perhaps, or something else?

     

    I can post more complete (longer) logs if that would be helpful; just didnt want to spam for forum! :)

     

    -J



  • 6.  RE: Cannot Roam?!

    EMPLOYEE
    Posted Sep 13, 2018 12:36 PM

    Simulate your experience and then on the controller commandline, type:

     

    show ap client trail-info <mac address of client>



  • 7.  RE: Cannot Roam?!

    Posted Sep 13, 2018 02:44 PM

    Nice functionality, show ap cclient trail-info, I did not know about this one!

     

    So, I just cleared the history, and repeated the exact fail case.  I associated in the first office (TestAP1) and then walked to the second office (TestAP2) and watched the fail.   This was captured in the trail-info, and it seems there may be some helpful info in there (about a flood).  Here is the output:

     

    Thanks

    -J

     

    (Aruba7030) [mynode] #show ap client trail-info 78:7b:8a:a4:a7:ac

     

    Client Trail Info

    -----------------

    MAC                BSSID              ESSID      AP-name     VLAN  Deauth Reason                       Alert

    ---                -----              -----      -------     ----  -------------                       -----

    78:7b:8a:a4:a7:ac  ac:a3:1e:a5:3e:11  TestESSID  TestAP2  1     Denied; Association Flood Detected  STA has roamed to another AP

     

    Deauth Reason

    -------------

    Reason                              Timestamp

    ------                              ---------

    Denied; Association Flood Detected  Sep 13 11:35:47

    Num Deauths:1

     

    Alerts

    ------

    Reason                        Timestamp

    ------                        ---------

    STA has roamed to another AP  Sep 13 11:35:38

    Num Alerts:1

     

    Mobility Trail

    --------------

    BSSID              ESSID      AP-name     VLAN  Timestamp

    -----              -----      -------     ----  ---------

    ac:a3:1e:a5:3e:11  TestESSID  TestAP2  1     Sep 13 11:35:47

    ac:a3:1e:a5:3e:11  TestESSID  TestAP2  1     Sep 13 11:35:38

    04:bd:88:98:37:f1  TestESSID  TestAP1   1     Sep 13 11:35:38

    ac:a3:1e:a5:3e:11  TestESSID  TestAP2  1     Sep 13 11:35:38

    Num Mobility Trails:4

    (Aruba7030) [mynode] #



  • 8.  RE: Cannot Roam?!

    Posted Sep 14, 2018 07:52 PM

    So, this one got everyone stumped?!

     

    -J



  • 9.  RE: Cannot Roam?!

    EMPLOYEE
    Posted Sep 14, 2018 08:21 PM

    It looks like you have Association Flood configured too low.  I would open a TAC case to make sure.



  • 10.  RE: Cannot Roam?!

    Posted Sep 14, 2018 08:58 PM

    Indeed, that would make sense, is that a tunable I can adjust?  If so, where would I look for that in the CLI?   This may truely be a simple solution!

     

    Thanks

    -J



  • 11.  RE: Cannot Roam?!

    EMPLOYEE
    Posted Sep 15, 2018 07:06 AM

    Check to see if under IDS "Client Flood Attack" is enabled:

    Screenshot 2018-09-15 at 06.05.02.png

    Where that would be enabled would vary based on your version of ArubaOS.



  • 12.  RE: Cannot Roam?!

    Posted Sep 15, 2018 11:07 AM

    Hi cjoseph -

     

    Thanks for clarifying what/where the knob is for that.  Alas, its turned off.  Heres some more detail:

     

    For your reference, here is my AOS version (latest):

     

    (Aruba7030) [mynode] #show version

    Aruba Operating System Software.
    ArubaOS (MODEL: Aruba7030-US), Version 8.3.0.0
    Website: http://www.arubanetworks.com
    (c) Copyright 2018 Hewlett Packard Enterprise Development LP.
    Compiled on 2018-04-28 at 03:31:39 UTC (build 64659) by p4build

    .. etc ..

     

    So, I've just got one dos-profile:

     

    (Aruba7030) [mynode] #show ids dos-profile

    IDS Denial Of Service Profile List
    ----------------------------------
    Name References Profile Status
    ---- ---------- --------------
    default 1

    Total:1

     

     

    Checking that profile, this particular IDS setting is not enabled.  Heres what I've got for that:

     

    (Aruba7030) [mynode] #
    (Aruba7030) [mynode] #show ids dos-profile default | include "Client Flood"
    Detect Client Flood Attack false
    Client Flood Detection Quiet Time 900 sec
    Client Flood Increase Time 3 sec
    Client Flood Threshold 150
    (Aruba7030) [mynode] #

     

     

    Here is everything I've got in the IDS/DOS settings within the only profile I've got.  Most of it is disabled.  I was hopeful to disable as much of IDS as possible for the purposes of this testing.  Perhaps there is a "master switch" somewhere to completely disable all IDS, firewalls, rate limits, etc?  If I can get this solved, I can then work backwards through it turning stuff back on until it breaks again...?

     

    (Aruba7030) [mynode] #show ids dos-profile default

    IDS Denial Of Service Profile "default"
    ---------------------------------------
    Parameter Value
    --------- -----
    Detect 802.11n 40MHz Intolerance Setting false
    Client 40MHz Intolerance Detection Quiet Time 900
    Detect AP Flood Attack false
    AP Flood Detection Quiet Time 900 sec
    AP Flood Increase Time 3 sec
    AP Flood Threshold 50
    Detect Block ACK DoS false
    Block ACK DoS Quiet Time 900 sec
    Detect ChopChop Attack false
    ChopChop Attack Detection Quiet Time 900 sec
    Detect Client Flood Attack false
    Client Flood Detection Quiet Time 900 sec
    Client Flood Increase Time 3 sec
    Client Flood Threshold 150
    Detect CTS Rate Anomaly false
    CTS Rate Quiet Time 900 sec
    CTS Rate Threshold 5000
    CTS Rate Time Interval 5 sec
    Detect Disconnect Station Attack false
    Disconnect STA Assoc Response Threshold 5
    Disconnect STA Deauth and Disassoc Threshold 8
    Disconnect STA Detection Quiet Time 900 sec
    Detect EAP Rate Anomaly false
    EAP Rate Quiet Time 900 sec
    EAP Rate Threshold 60
    EAP Rate Time Interval 3 sec
    Detect FATA-Jack Attack false
    FATA-Jack Attack Detection Quiet Time 900 sec
    Detect Invalid Address Combination false
    Invalid Address Combination Detection Quiet Time 900 sec
    Detect Malformed Frame - Assoc Request false
    Malformed Assoc Request Detection Quiet Time 900 sec
    Detect Malformed Frame - Auth false
    Malformed Auth Frame Detection Quiet Time 900 sec
    Detect Malformed Frame - HT IE false
    Malformed HT IE Detection Quiet Time 900 sec
    Detect Malformed Frame - Large Duration false
    Malformed Large Duration Detection Quiet Time 900 sec
    Detect Omerta Attack false
    Omerta Attack Detection Quiet Time 900 sec
    Omerta Detection Threshold 10 %
    Detect Overflow EAPOL Key false
    Overflow EAPOL Key Detection Quiet Time 900 sec
    Detect Overflow IE false
    Overflow IE Detection Quiet Time 900 sec
    Detect Power Save DoS Attack false
    Power Save DoS Detection Quiet Time 900 sec
    Power Save DoS Detection Threshold 90 %
    Power Save DoS Detection Minimum Frames 700
    Detect WPA FT Attack false
    WPA FT Attack Detection Quiet Time 900 sec
    WPA FT Attack Detection Threshold 45
    WPA FT Attack Detection Time Interval 60 sec
    Detect Rate Anomalies false
    Rate Thresholds for Assoc Frames default
    Rate Thresholds for Disassoc Frames default
    Rate Thresholds for Deauth Frames default
    Rate Thresholds for Probe Request Frames probe-request-response-thresholds
    Rate Thresholds for Probe Response Frames probe-request-response-thresholds
    Rate Thresholds for Auth Frames default
    Detect RTS Rate Anomaly false
    RTS Rate Quiet Time 900 sec
    RTS Rate Threshold 5000
    RTS Rate Time Interval 5 sec
    Spoofed Deauth Blacklist Disabled
    Detect TKIP replay Attack false
    TKIP Replay Attack Detection Quiet Time 900 sec
    (Aruba7030) [mynode] #



  • 13.  RE: Cannot Roam?!

    EMPLOYEE
    Posted Sep 15, 2018 02:07 PM

    That might have been the issue at that specific time.  You should run roaming tests with other clients to find out what is the issue with them.

     

    If you have removed lower rates on SSIDs or enabled "Local Probe Request Threshold" on your SSID, that could also cause roaming issues.



  • 14.  RE: Cannot Roam?!

    Posted Sep 15, 2018 02:42 PM

    I’m not sure about specific times but we can reproduce this on demand 100% of the time, day or night, weekday or weekend.  I’m counting my lucky stars for that!

     

    Will check the local probe request shortly.

     

    We have removed the lower rates.  On these AP I will restore the factory defaults on those for the purpose of this test.

     

    What is the thinking about other clients, exactly?  We’re using iPads and iPhone X in our test scenario since those are arguably one of the worlds pre-eminent mobile devices and simply must be supported no matter what.  Is the thinking that somehow using other clients might help us narrow down the problem?  Said another way, I can’t not support iPads and iPhones for roam... that idea would be dead before even raised.

     

    thanks

    -J



  • 15.  RE: Cannot Roam?!

    EMPLOYEE
    Posted Sep 15, 2018 02:46 PM

    There are many reasons why you have the current symptoms.   There is nothing in ArubaOS stopping you from making changes from the defaults, because it assumes you are comfortable with the results and you know what you are doing.  I would return things to the defaults one by one until you figure out what the problem is.



  • 16.  RE: Cannot Roam?!

    Posted Sep 15, 2018 02:59 PM

    Indeed, we may ultimately do a factory default during a maintenance window and see if things change.  I’ve just been hopeful you guys might know right off the bat.. especially when the system is logging explicit denials.  Someone with access to “the code” ought to be able to look up “in what case is this error message given” to see what controls affect it.  I’d then go turn all of those off and start from there.

     

    Anyway, I’ll run the last set of ideas you mentioned to see what affect it has.  I may even be able to down the system this weekend and try the factory reset after doing full backups so we can re-load our config after that test - I agree it’s always good to start with a fresh config out of the box!

     

    -J



  • 17.  RE: Cannot Roam?!

    EMPLOYEE
    Posted Sep 15, 2018 03:08 PM

    I'm not saying to factory default.  That is extreme and not what I suggested.

     

    Putting the rates back to the defaults could solve the issue, but you need to try one thing at a time.



  • 18.  RE: Cannot Roam?!

    Posted Sep 15, 2018 03:19 PM

    I’ll surely run the steps we’ve previously discussed here, no question.

     

    my thinking, though, is if that does not take ground we might want to change direction.  Since we have the luxury of a 100% reproducible scenario (rare in my experience) AND I can very likely take the system down during a period this weekend... seems it might be quite reasonable to go to the out-of-the-box settings (“factory”) and start from there.  We’d just be focused on these two AP, without the rest of our configs or anything else in play.  If it works, then I need to identify what’s different.  If it doesn’t work then any and all customizations we may have made, purposely or inadvertently, are out of the picture and we can engage with Aruba starting with the level and clean playing field of the factory config to sort out how to accomplish this roam scenario.  Once sorted, the developed solution could be adapted to our desired config.   Do you think this isn’t a good approach?  I’m pedantically methodical but I also have the “time is a factor” aspect of real life to balance here and with the hundreds of thousands of tunables on this platform.. seems that building on the known good factory settings might make sense after a certain point.

     

    -J



  • 19.  RE: Cannot Roam?!

    Posted Sep 15, 2018 03:26 PM

    I have remotely made the change to the rates and requested the duty tech to run the re-produce scenario.  Hopefully it can be something as simple as this.  I should have results soon.

     

    Here are the rates as now configured:

     

     

    Thanks for your help and patience.

     

    -J



  • 20.  RE: Cannot Roam?!

    Posted Sep 15, 2018 03:36 PM

    Well, that went faster than expected.  Test results are that the client devices still do get kicked off (and pop up a message about cellular data being turned off, as a result) but then do much more quickly sort themselves out and get back onto the wifi.

     

    Different log messages show in the "show ap client trail-info" output.. as below.  Seems the system is refusing the client, but this time for a different reason?

     

     

     

    And this is what we've got for that wlan configured... wpa2-psk-aes... I think this is as recommended if you have to do a psk setup (we do).

     

     

     



  • 21.  RE: Cannot Roam?!

    EMPLOYEE
    Posted Sep 15, 2018 03:38 PM

    There are no pictures attached to either post.



  • 22.  RE: Cannot Roam?!

    Posted Sep 16, 2018 12:25 AM

    How strange.. maybe hit a forum bug.  Seems forum allows inline images but maybe doesnt show them to certain users or browsers?  For fun, I've attached a screenshot of how the forum appears to me right now.  That might be of interest to its devs.  For their use: I'm on a mac, latest os and patch level, and using its shipping safari browser.

     

    Meanwhile, back to the roaming thing -- thankfully I did save the earlier posted screenshots.  They are attached here as files as well.

     

    Thanks!

    -J



  • 23.  RE: Cannot Roam?!

    EMPLOYEE
    Posted Sep 16, 2018 06:54 AM

    Try setting your local probe request threshold to 0.

    Also set your 802.11a and 802.11g beacon rates to "default" instead of 12.