Odd client disconnections...

View Only

last person joined: yesterday

Access network design for branch, remote, outdoor, and campus locations with HPE Aruba Networking access points and mobility controllers.

Back to discussions

Expand all | Collapse all

Odd client disconnections...

This thread has been viewed 8 times

1. Odd client disconnections...

0 Kudos
rah322
Posted Sep 07, 2016 08:37 PM

Reply Reply Privately
I have a very odd issue & I'm posting it to the community to see if I can get any useful suggestions.

I'm in the process of migrating from 5 x 3600 controllers running AOS 6.4.3.6 onto 2 x 7220 controllers, also running AOS 6.4.3.6. As such, I'm taking the opportunity to streamline my configs & get rid of redundant or irrelevant configurations. I'm almost there; however, I've run into a problem.

I have a case open w/ TAC to review my configs & help troubleshoot.

The problem is... After about 30 minutes of connectivity, the 2 mobile devices I test with seem to drop their connections. A laptop running Windows 7 does not experience the same issue.

Testing involves clients connecting to a configured RAP (AP-105). The RAP's config was previously running on the old set of 3600 controllers - I know that isn't the issue. Why a RAP? We remotely run the networking for a small college & use the RAP to test new configs & the general state of networking from our office.

Clients (thus far) are an Apple iPad Mini 2, running iOS 9.3.5 & an Nexus 7 2013, running Android 6.0.1. I'll try to expand testing clients to include Windows 10 & a Mac OS X client; however, as mentioned above a Win7 clients does not seem to experience the same issue.

Clients can successfully associate, authenticate, & get on the proper network.

Clients successfully associate & authenticate to an 802.1x network. We use ClearPass as our RADIUS server w/ an AD Authentication backend.

Once clients drop their wifi connection, I notice that they're still in the user table (show user) & appear to still be associated (show ap assoc), which is contrary to the client experience.

Furthermore, once the Android client drops is connection, I see an increased number of successful RADIUS authentications on the ClearPass server (anywhere from 15 to 30 seconds apart) for a bit, yet the client still thinks its not connected. This behavior will continue until I forcefully put the client to sleep, only then will it cease to create RADIUS auth entries.

My iOS client also disconnects some 30 - 45 minutes after successfully joining the network, but it will not continue to attempt RADIUS authentications after it has dropped form the network. The iOS device appears to successfully reconnect after its put to sleep for some untracked amount of time.

Both clients are awake & actively performing an iperf test when they drop their connections. I had initially observed these drops while the devices were idle & thought them strange enough that I started streaming YouTube videos to see if they still dropped when receiving traffic - they did. TAC instructed me to make the client generate traffic such that I don't trigger idle timeout, in the event that my buffers are full enough to cause such a thing... I'm dubious.

I've configured these clients with user-debug logging & have provided them to TAC for further analysis. I'm focusing on my Android test for the time begin - for obvious reasons.

The only concrete thing I've been able to get out of it, is a dreaded Unspecified Failure.

Sep 7 18:49:27 :522296: <DBUG> |authmgr| Auth GSM : USER_STA delete event for user d8:50:e6:8a:ff:a8 age 0 deauth_reason 1 Sep 7 18:49:27 :522036: <INFO> |authmgr| MAC=d8:50:e6:8a:ff:a8 Station DN: BSSID=00:24:6c:b4:ac:b2 ESSID=test-cabrini-eduroam VLAN=646 AP-name=00:24:6c:c3:4a:cb Sep 7 18:49:27 :522234: <DBUG> |authmgr| Setting idle timer for user d8:50:e6:8a:ff:a8 to 300 seconds (idle timeout: 300 ageout: 0). Sep 7 18:49:27 :501000: <DBUG> |stm| Station d8:50:e6:8a:ff:a8: Clearing state Sep 7 18:49:27 :501102: <NOTI> |AP 00:24:6c:c3:4a:cb@1.1.1.2 stm| Disassoc from sta: d8:50:e6:8a:ff:a8: AP 1.1.1.2-00:24:6c:b4:ac:b2-00:24:6c:c3:4a:cb Reason Unspecified Failure Sep 7 18:49:27 :501000: <DBUG> |AP 00:24:6c:c3:4a:cb@1.1.1.2 stm| Station d8:50:e6:8a:ff:a8: Clearing state Sep 7 18:49:28 :501109: <NOTI> |AP 00:24:6c:c3:4a:cb@1.1.1.2 stm| Auth request: d8:50:e6:8a:ff:a8: AP 1.1.1.2-00:24:6c:b4:ac:b2-00:24:6c:c3:4a:cb auth_alg 0 Sep 7 18:49:28 :501093: <NOTI> |AP 00:24:6c:c3:4a:cb@1.1.1.2 stm| Auth success: d8:50:e6:8a:ff:a8: AP 1.1.1.2-00:24:6c:b4:ac:b2-00:24:6c:c3:4a:cb Sep 7 18:49:28 :501100: <NOTI> |stm| Assoc success @ 18:49:28.539679: d8:50:e6:8a:ff:a8: AP 1.1.1.2-00:24:6c:b4:ac:b2-00:24:6c:c3:4a:cb Sep 7 18:49:28 :522295: <DBUG> |authmgr| Auth GSM : USER_STA event 0 for user d8:50:e6:8a:ff:a8 Sep 7 18:49:28 :522035: <INFO> |authmgr| MAC=d8:50:e6:8a:ff:a8 Station UP: BSSID=00:24:6c:b4:ac:b2 ESSID=test-cabrini-eduroam VLAN=646 AP-name=00:24:6c:c3:4a:cb

I've been working w/ TAC now for about a week.

I believe I've ruled out the Android client by connecting it to a different cluster / network (that's ran locally) w/ a similar 802.1x network config. When the client is connected to this different network it does not fail after 30 minutes of activity & I'm able to complete an iperf test.

I believe I've ruled out the RAP by migrating it back to the old 3600 controllers. In this setup, the APs & the Wifi configs are closer to what I'd ultimately like to move towards since they are the genesis of the configs that now exist on the new 7220s. When the Android client connects to the network running on the old 3600s, it does not fail & is able to complete its iperf test.

During these tests the CPPM server has remained constant. Only in the local cluster test did the service classification differ.

Now my questions are...

Has anyone experienced anything similar?

Does anyone have ANY suggestions about where I should go digging?

I started asking myself, what was cause a client's connection to fail - and thus began looking into where ever there might be a timeout variable set (aaa timers & aaa auth profiles), but they all looked normal.

I haven't done an exhaustive scrub of my configs but I'm fairly certain I've looked over the most obvious profiles & configurations. This is, after all, a new controller environment. Just about all of the network configuration, AAA server-group, AAA auth profiles, SSID profiles, etc. etc. was cleaned & optimized (either using ASE recommendations or best practices), & I don't believe I've strayed too much from the original configs on the 3600, just a few optimizations here & there.

Since I'm only testing w/ this RAP, I haven't had the need to expand any RF profiles & the RAP's configs on the new controllers are VERY similar to that on the old controllers, on which this issue does not occur.

In any case, thanks for any suggestions.

TIA,
2. RE: Odd client disconnections...

0 Kudos
EMPLOYEE

cjoseph
Posted Sep 07, 2016 09:34 PM

Reply Reply Privately
I would run the command below right after the disconnection to see who sends the deauth first.

show ap client trail-info <client-mac>
3. RE: Odd client disconnections...

0 Kudos
EMPLOYEE

cjoseph
Posted Sep 07, 2016 09:59 PM

Reply Reply Privately
Honestly, a client will remain in the user-table, even if it is having issues with connectivity for at least 5 minutes until it is aged out, so the user table is not necessarily a good indicator of what is happening.

"show auth-tracebuf mac <the mac address of the client>" might be an effective tool to see who is sending what radius packets back and forth, when you are having the issue. Deauth from client, means that the client itself forcibly disconnected for some reason, and that reason might involve some guesswork. There are some clients that dont like more than 100 milliseconds of delay between a radius request and the response. If you have more than that, that might be your issue.

The RF could have congestion or delays on the band that the client is on. The show ap association client-mac <mac address of client> will tell you the percentage of retries on the channel vs. the client when you are having that issue.

4. RE: Odd client disconnections...

Kudos

rah322

Posted Sep 08, 2016 02:29 PM

@cjoseph wrote:
Honestly, a client will remain in the user-table, even if it is having issues with connectivity for at least 5 minutes until it is aged out, so the user table is not necessarily a good indicator of what is happening.

Yes, I'm starting to see that...

"show auth-tracebuf mac <the mac address of the client>" might be an effective tool to see who is sending what radius packets back and forth, when you are having the issue.

I've abbreviated the show 'auth-tracebuf mac <MAC>' output.

@The only transitions are between station down & station ups. Also, note that after the drop @ around 9:38 my clients sends successful re-auths after the fact, but does not manage to get back online.

(cu-master) #show auth-tracebuf mac d8:50:e6:8a:ff:a8

Warning: user-debug is enabled on one or more specific MAC addresses;
	 only those MAC addresses appear in the trace buffer.

Auth Trace Buffer
-----------------
                    
                    
Sep  8 04:36:21  eap-resp              ->  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2            14  43    
Sep  8 04:36:21  rad-req               ->  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2/cpradius4  24  292   
Sep  8 04:36:21  rad-accept            <-  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2/cpradius4  24  246   
Sep  8 04:36:21  eap-success           <-  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2            14  4     
Sep  8 04:36:21  wpa2-key1             <-  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2            -   117   
Sep  8 04:36:21  wpa2-key2             ->  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2            -   117   
Sep  8 04:36:21  wpa2-key3             <-  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2            -   151   
Sep  8 04:36:21  wpa2-key4             ->  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2            -   95    
Sep  8 04:36:51  station-down           *  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2            -   -     
Sep  8 09:08:40  station-up             *  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2            -   -     wpa2 aes
Sep  8 09:08:40  eap-id-req            <-  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2            1   5     
Sep  8 09:08:40  eap-id-resp           ->  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2            1   25    wireless2@drexel.edu
Sep  8 09:08:40  rad-req               ->  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2            25  244   
Sep  8 09:08:40  rad-resp              <-  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2/cpradius4  25  76    
Sep  8 09:08:40  eap-req               <-  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2            2   6     
Sep  8 09:08:40  eap-nak               ->  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2            2   6     
...
Sep  8 09:08:40  eap-req               <-  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2            14  43    
Sep  8 09:08:40  eap-resp              ->  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2            14  43    
Sep  8 09:08:40  rad-req               ->  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2/cpradius4  38  292   
Sep  8 09:08:40  rad-accept            <-  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2/cpradius4  38  246   
Sep  8 09:08:40  eap-success           <-  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2            14  4     
Sep  8 09:08:40  wpa2-key1             <-  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2            -   117   
Sep  8 09:08:40  wpa2-key2             ->  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2            -   117   
Sep  8 09:08:40  wpa2-key3             <-  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2            -   151   
Sep  8 09:08:40  wpa2-key4             ->  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2            -   95    
Sep  8 09:38:00  station-down           *  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2            -   -     
Sep  8 09:38:01  station-up             *  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2            -   -     wpa2 aes
Sep  8 09:38:01  eap-id-req            <-  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2            1   5     
Sep  8 09:38:01  eap-id-resp           ->  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2            1   25    wireless2@drexel.edu
Sep  8 09:38:01  rad-req               ->  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2            39  244   
Sep  8 09:38:01  rad-resp              <-  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2/cpradius4  39  76    
Sep  8 09:38:01  eap-req               <-  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2            2   6     
Sep  8 09:38:01  eap-nak               ->  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2            2   6     
...
Sep  8 09:38:01  eap-req               <-  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2            14  43    
Sep  8 09:38:01  eap-resp              ->  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2            14  43    
Sep  8 09:38:01  rad-req               ->  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2/cpradius4  52  292   
Sep  8 09:38:01  rad-accept            <-  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2/cpradius4  52  246   
Sep  8 09:38:01  eap-success           <-  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2            14  4     
Sep  8 09:38:01  wpa2-key1             <-  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2            -   117   
Sep  8 09:38:01  wpa2-key2             ->  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2            -   117   
Sep  8 09:38:01  wpa2-key3             <-  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2            -   151   
Sep  8 09:38:01  wpa2-key4             ->  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2            -   95    
Sep  8 09:38:31  station-down           *  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2            -   -     
Sep  8 09:38:33  station-up             *  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2            -   -     wpa2 aes
Sep  8 09:38:33  eap-id-req            <-  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2            1   5     
Sep  8 09:38:33  eap-id-resp           ->  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2            1   25    wireless2@drexel.edu
Sep  8 09:38:33  rad-req               ->  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2            53  244   
Sep  8 09:38:33  rad-resp              <-  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2/cpradius4  53  76    
Sep  8 09:38:33  eap-req               <-  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2            2   6     
Sep  8 09:38:33  eap-nak               ->  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2            2   6     
...
Sep  8 09:38:34  eap-req               <-  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2            14  43    
Sep  8 09:38:34  eap-resp              ->  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2            14  43    
Sep  8 09:38:34  rad-req               ->  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2/cpradius4  66  292   
Sep  8 09:38:34  rad-accept            <-  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2/cpradius4  66  246   
Sep  8 09:38:34  eap-success           <-  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2            14  4     
Sep  8 09:38:34  wpa2-key1             <-  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2            -   117   
Sep  8 09:38:34  wpa2-key2             ->  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2            -   117   
Sep  8 09:38:34  wpa2-key3             <-  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2            -   151   
Sep  8 09:38:34  wpa2-key4             ->  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2            -   95    
Sep  8 09:39:04  station-down           *  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2            -   -     
Sep  8 09:39:05  station-up             *  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2            -   -     wpa2 aes
Sep  8 09:39:05  eap-id-req            <-  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2            1   5     
Sep  8 09:39:05  eap-id-resp           ->  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2            1   25    wireless2@drexel.edu
Sep  8 09:39:05  rad-req               ->  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2            67  244   
Sep  8 09:39:05  rad-resp              <-  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2/cpradius4  67  76    
Sep  8 09:39:05  eap-req               <-  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2            2   6     
Sep  8 09:39:05  eap-nak               ->  d8:50:e6:8a:ff:a8  00:24:6c:b4:ac:b2            2   6     
...

The RF could have congestion or delays on the band that the client is on. The show ap association client-mac <mac address of client> will tell you the percentage of retries on the channel vs. the client when you are having that issue.

I tried to break down this data in separate tables, but it's probably easiest if I just upload an image. I tried gathering this data in 10 minute intervals, but stepped it up once I got closer to my 30 minute disconnect time.

The Channel & Frame Retry rates aren't too bad when comparted to previous data from when the AP was terminated on my old 3600 controllers. Channel Frame Error Rates are consistent (5-6%) when terminated on the old 3600s. I only have 4 samples while it was terminated on the old 3600s. I'll need to see about using AirRecorder or get a bit krafty w/ my expect scripts to get more data.

Your comments on channel interference makes me kinda want to try increasing my 5 GHz power, potentially droping 2.4 GHz to see if that makes any difference.

I'm going to wait to see what TAC recommends, but thanks for all of your input.

5. RE: Odd client disconnections...

0 Kudos
EMPLOYEE

cjoseph
Posted Sep 08, 2016 02:39 PM

Reply Reply Privately
Those SNRs, 72, 68, 60 are super-strong. You might want to move the client away from the AP a bit to prevent distortion..
6. RE: Odd client disconnections...

0 Kudos
rah322
Posted Sep 08, 2016 01:00 PM

Reply Reply Privately
The output of 'show ap client-trail <MAC>' just gives that Unspecified Failure. From this output, I can't tell who's sending the DEAUTH. I'm certain that in other cases that might be possible.

Successful auth occured @ 09:08:40, and after about 30 minutes of connectivity, the disconnection occured @ around 09:38:00.

After the intial disconnection, the client appears to successfully send re-auth, but fails to re-join / associate w/ the network.

(cu-master) #show ap client trail-info d8:50:e6:8a:ff:a8 Client Trail Info ----------------- MAC BSSID ESSID AP-name VLAN Deauth Reason Alert --- ----- ----- ------- ---- ------------- ----- d8:50:e6:8a:ff:a8 00:24:6c:b4:ac:b2 test-cabrini-eduroam 00:24:6c:c3:4a:cb 646 Unspecified Failure Unspecified Failure Deauth Reason ------------- Reason Timestamp ------ --------- Unspecified Failure Sep 8 09:40:39 Unspecified Failure Sep 8 09:40:07 Unspecified Failure Sep 8 09:39:35 Unspecified Failure Sep 8 09:39:03 Unspecified Failure Sep 8 09:38:32 Unspecified Failure Sep 8 09:38:00 Unspecified Failure Sep 8 04:36:51 Unspecified Failure Sep 8 04:36:19 Unspecified Failure Sep 8 04:35:47 Unspecified Failure Sep 8 04:35:16 Num Deauths:10 Alerts ------ Reason Timestamp ------ --------- Unspecified Failure Sep 8 09:40:39 Unspecified Failure Sep 8 09:40:07 Unspecified Failure Sep 8 09:39:35 Unspecified Failure Sep 8 09:39:03 Unspecified Failure Sep 8 09:38:32 Unspecified Failure Sep 8 09:38:00 Unspecified Failure Sep 8 04:36:51 Unspecified Failure Sep 8 04:36:19 Unspecified Failure Sep 8 04:35:47 Unspecified Failure Sep 8 04:35:16 Num Alerts:10 Mobility Trail -------------- BSSID ESSID AP-name Timestamp ----- ----- ------- --------- 00:24:6c:b4:ac:b2 test-cabrini-eduroam 00:24:6c:c3:4a:cb Sep 8 09:40:40 00:24:6c:b4:ac:b2 test-cabrini-eduroam 00:24:6c:c3:4a:cb Sep 8 09:40:39 00:24:6c:b4:ac:b2 test-cabrini-eduroam 00:24:6c:c3:4a:cb Sep 8 09:40:08 00:24:6c:b4:ac:b2 test-cabrini-eduroam 00:24:6c:c3:4a:cb Sep 8 09:40:07 00:24:6c:b4:ac:b2 test-cabrini-eduroam 00:24:6c:c3:4a:cb Sep 8 09:39:37 00:24:6c:b4:ac:b2 test-cabrini-eduroam 00:24:6c:c3:4a:cb Sep 8 09:39:35 00:24:6c:b4:ac:b2 test-cabrini-eduroam 00:24:6c:c3:4a:cb Sep 8 09:39:04 00:24:6c:b4:ac:b2 test-cabrini-eduroam 00:24:6c:c3:4a:cb Sep 8 09:39:03 00:24:6c:b4:ac:b2 test-cabrini-eduroam 00:24:6c:c3:4a:cb Sep 8 09:38:33 00:24:6c:b4:ac:b2 test-cabrini-eduroam 00:24:6c:c3:4a:cb Sep 8 09:38:32 Num Mobility Trails:10 (cu-master) #

I'll look into your other comment shortly.

Thanks,

Wireless Access

Odd client disconnections...

1. Odd client disconnections...

2. RE: Odd client disconnections...

3. RE: Odd client disconnections...

4. RE: Odd client disconnections...

5. RE: Odd client disconnections...

6. RE: Odd client disconnections...