Aruba Instant & Cloud Wi-Fi

Reply
Contributor II
Posts: 48
Registered: ‎02-23-2017

Re: Client throughput capped at 40M instead of +200M occationally

[ Edited ]

Just to iron out a few features I've disable background spectrum-monitoring, set air-time-fairness to default and disabled client-match, but am still having similar problems.

 

Attached is a pcap with this configuration, where the client successfully negotiate BA, and uses A-PMDU to transfer data, but is only able to push about 40Mbps!

 

This time I almost don't see any TxRetries, Client_health is 100/100, and bitrate is still 400M. Everything just looks perfect - but it's just slow! So freaking annoying :)

 

Contributor II
Posts: 48
Registered: ‎02-23-2017

Re: Client throughput capped at 40M instead of +200M occationally

I've now been doing some testing on 2.4GHz, and I seem to have the same problem there.

 

At first everything looked perfect, I connected with a bitrate of 144Mbps, and could transfer 80-90Mbps both up and down, then suddenly half an hour later, I can now only upload about 40Mbps, but can still download 80Mbps.

 

Confirmed with wireshark, BA is still being used in this case.

Contributor II
Posts: 48
Registered: ‎02-23-2017

Re: Client throughput capped at 40M instead of +200M occationally

I just made a very interesting observation.

 

I've shutdown all clients, rebooted the IAP running the latest code (4.3.1.3), and connected my single test-client. It was running fine for a few minutes, then the problem started and I were only able to upload about 30M, but still download 200M.

 

But, the interesting part is that every time I execute "show tech", or "show cpu" on the IAP, the throughput increases to normal for a few seconds:

 

[  4]  72.01-73.01  sec  3.50 MBytes  29.2 Mbits/sec
[  4]  73.01-74.01  sec  3.25 MBytes  27.2 Mbits/sec
[  4]  74.01-75.01  sec  3.25 MBytes  27.4 Mbits/sec
[  4]  75.01-76.01  sec  3.50 MBytes  29.2 Mbits/sec
[  4]  76.01-77.00  sec  3.25 MBytes  27.5 Mbits/sec
[  4]  77.00-78.01  sec  3.50 MBytes  29.3 Mbits/sec
[  4]  78.01-79.00  sec  3.50 MBytes  29.4 Mbits/sec
[  4]  79.00-80.00  sec  3.25 MBytes  27.3 Mbits/sec
[  4]  80.00-81.00  sec  3.25 MBytes  27.3 Mbits/sec
[  4]  81.00-82.00  sec  3.50 MBytes  29.3 Mbits/sec
[  4]  82.00-83.00  sec  24.0 MBytes   202 Mbits/sec
[  4]  83.00-84.00  sec  26.5 MBytes   222 Mbits/sec
[  4]  84.00-85.01  sec  26.2 MBytes   220 Mbits/sec
[  4]  85.01-86.00  sec  25.6 MBytes   216 Mbits/sec
[  4]  86.00-87.00  sec  27.1 MBytes   228 Mbits/sec
[  4]  87.00-88.01  sec  15.4 MBytes   128 Mbits/sec
[  4]  88.01-89.01  sec  3.50 MBytes  29.3 Mbits/sec
[  4]  89.01-90.00  sec  3.50 MBytes  29.6 Mbits/sec
[  4]  90.00-91.00  sec  3.50 MBytes  29.3 Mbits/sec
[  4]  91.00-92.00  sec  3.25 MBytes  27.3 Mbits/sec
[  4]  92.00-93.00  sec  3.50 MBytes  29.3 Mbits/sec
[  4]  93.00-94.01  sec  3.25 MBytes  27.0 Mbits/sec
[  4]  94.01-95.00  sec  3.50 MBytes  29.6 Mbits/sec
[  4]  95.00-96.00  sec  3.50 MBytes  29.3 Mbits/sec

Above is for a "show tech". If I do a "show cpu" it's only good for 1-2 sec.

 

If I do something else like a "show clients" or "show log system", I see no difference.

 

Feels like a CPU / interrupt schduling thing, where a show cpu, somehow releases the queue.

 

Current load during the upload:

 

a8:bd:27:c0:44:fc# show cpu details
Mem: 172944K used, 309008K free, 0K shrd, 0K buff, 28128K cached
Load average: 4.07 3.95 3.81  (Status: S=sleeping R=running, W=waiting)
  PID USER     STATUS   RSS  PPID %CPU %MEM COMMAND
27590 root     SW         0     2 16.4  0.0 kworker/1:2
   34 root     RW         0     2  7.4  0.0 kworker/0:1
 2419 root     SW         0     2  1.4  0.0 power_monitor
 3443 root     S <    12520  3367  0.0  2.5 cli
 3451 root     S N     5700  3367  0.0  1.1 sapd
 3543 root     S       3936  3367  0.0  0.8 ble_relay
 9934 root     S <     3520  9933  0.0  0.7 cli
 3519 root     S       2496  3367  0.0  0.5 mdns
 3469 root     S <     2456  3367  0.0  0.5 stm
 3498 root     S       2264  3367  0.0  0.4 snmpd_sap
...

 

 

Contributor II
Posts: 48
Registered: ‎02-23-2017

Re: Client throughput capped at 40M instead of +200M occationally

OMG, I've just concluded that the problem actually starts when I ssh into the IAP!

 

When I have no SSH CLI sessions open everything works fine, but as soon as I open an ssh session the thoughput drops from 200M to about 80M. If I open a second ssh session it drops to about 40M and the cli process starts using about 50% cpu:

 

a8:bd:27:c0:44:fc# show cpu details
Mem: 174756K used, 307196K free, 0K shrd, 0K buff, 28128K cached
Load average: 4.17 2.90 2.60  (Status: S=sleeping R=running, W=waiting)
  PID USER     STATUS   RSS  PPID %CPU %MEM COMMAND
 5973 root     R <     3316  5972 48.8  0.6 cli
    9 root     RW         0     2  2.1  0.0 kworker/1:0
 7207 root     R <      344  6965  1.0  0.0 top
   34 root     SW         0     2  1.0  0.0 kworker/0:1
 3367 root     S       1276     1  0.5  0.2 nanny
 2420 root     SW         0     2  0.5  0.0 power_monitor
 3443 root     S <    12504  3367  0.0  2.5 cli
 3451 root     S N     5708  3367  0.0  1.1 sapd
 3543 root     S       3936  3367  0.0  0.8 ble_relay
 6965 root     S <     3304  6964  0.0  0.6 cli

With my IAP running 6.5.1.0-4.3.1.1 I could reproduce the behaviour after a factory reset by login into the WebUI, creating a default SSID for 5GHz, PSK.

 

Only SSH:ing to the IAP did not trigger the problem on the factory reset IAP, but after I issued:

 

34:fc:b9:c6:6a:0c# conf t
now support CLI commit model, please type "commit apply" for configuration to take effect.
34:fc:b9:c6:6a:0c (config) # loginsession timeout 0
34:fc:b9:c6:6a:0c (config) # exit
34:fc:b9:c6:6a:0c# commit apply
committing configuration...
configuration committed.

The exact same behavior started.

 

It just sounds far to easy and I just can't believe I've spent the whole weekend throublshooting this without realizing this relation earlier if there is not more behind it causing it not always to behave like this.

 

Attaching a show tech from the newly factory reset:ed IAP just after the problem started again.

Contributor II
Posts: 48
Registered: ‎02-23-2017

Re: Client throughput capped at 40M instead of +200M occationally

I was informed this *might* be fixed in 6.5.2.0, so I've upgraded one of my IAP-315 to that, and yeah, so far it looks promising...

 

I can no longer reproduce the "ssh cli" high cpu problem and see no drops in my throughput.

 

I have had my client enter the "40m" state though where I could only upload 40M, but that time I could see BA/A-MPDU was not beeing used, so that I blame my client NIC drivers. After a client reboot it was back to normal again.

 

Attached is a show-tech while performing the iperf measurement uploading about 200Mbps.

 

I'll keep my fingers crossed and give it a few days of testing.

 

PS: I also tested 4.3.1.3 on an IAP-205 and it suffered from the same "ssh cli" bug. Only difference was that it was even worse :) From 200M -> 2m with a single ssh session opened.

 

Contributor II
Posts: 48
Registered: ‎02-23-2017

Re: Client throughput capped at 40M instead of +200M occationally

[ Edited ]

I can also add that I've tested to disable A-MPDU in the IAP with:

 

(config) #  wlan ssid-profile Test
(SSID Profile "Test") # mpdu-agg-disable

and can confirm that the TCP throughput drops from 200M -> 40M with only that change.

 

So, to conclude, it looks like I've been fighting two bugs at the same time:

 

1) The client NIC (Intel AC-8260) fails to negotiate A-MPDU after some time when running on driver v18.32.1.2. After upgrading to the latest drivers from Intel (19.50.1.5), the problem is resolved. For some reason my first upgrade attempt were non-successful and caused a lot of retransmissions and even worse A-MPDU negotiations. 

 

2) "SSH CLI - loginsession timeout 0" bug in at least IAP 4.3.1.1-4.3.1.3 causing high CPU load and decreased throughput (200->80M for 1 ssh session). Confirmed by TAC as bug id #156250. When "loginsession timeout 0" is configured it causes high CPU load when SSH is connected. Fixed in 6.5.2.0. Problem observed on IAP-205 and IAP-315.

 

Work-around: Disable loginsession timeout (which is the default) with:

no loginsession timeout

 

Highlighted
Contributor II
Posts: 48
Registered: ‎02-23-2017

Re: Client throughput capped at 40M instead of +200M occationally

Today I did a new attempt to upgrade the Intel drivers on the client to the latest available (19.50.1.5) and this time it actually seems to have fixed the issue!

 

Last time I tried to upgrade I ended up in some strange state where the client was never able to negotiate Block Ack (A-MPDU) and there was a lot retransmission etc.

 

I suspect the new driver never got properly updated (Even though it listed the new version when looking at the NIC properties). This time I first uninstalled the whole Inter PROset package before installing the new one. Perhaps that made the difference.

 

Anyway, for the last 5 hours everything has worked perfect, so I'm keeping my fingers crossed.

Search Airheads
Showing results for 
Search instead for 
Did you mean: