Controllerless Networks

last person joined: 20 hours ago 

Aruba Instant Wi-Fi: Meet the controllerless Wi-Fi solution that's easy to set-up, is loaded with security and smarts, and won't break your budget.
Expand all | Collapse all

Client throughput capped at 40M instead of +200M occationally

Jump to Best Answer
  • 1.  Client throughput capped at 40M instead of +200M occationally

    Posted Apr 14, 2017 07:18 AM

    A customer of mine have reported odd performance issues running with IAP-305 or IAP-315 on 5Ghz, 40Mhz VHT channels.

     

    In an attempt to reproduce the problem I've deployed a single IAP-315 connected to a 2530-24G-PoEP switch. Connected to the switch is a wired desktop running Linux and iperf3 in server mode.

     

    The IAP broadcasts a single SSID "MathiasPSK" on 5GHz, 40MHz and VHT enabled, and there are no other 5GHz APs or devices in the neighbourhood:

     

    a8:bd:27:c0:44:fc# show ap spectrum channel-details
    
    Channel Summary Table
    ---------------------
    Channel  Quality(%)  Utilization(%)  WiFi(%)  Bluetooth(%)  Microwave(%)  Cordless Phone(%)  Total nonwifi(%)  KnownAPs  UnknownAPs  Noise Floor(dBm)  MaxAPSignal(dBm)  Max AP SSID  Max AP BSSID  MaxInterference(dBm)  SNIR(dB)
    -------  ----------  --------------  -------  ------------  ------------  -----------------  ----------------  --------  ----------  ----------------  ----------------  -----------  ------------  --------------------  --------
    52+      100         0               0        0             0             0                  0                 1         0           -94               -                 -            -             -                     0

    While only having a single laptop or two connected, I can both send and receive 200+ Mbps from my laptop "IT-089":

     

    C:\Users\mathias.sundman\Downloads\iperf-3.1.3-win64>iperf3 -c 192.168.88.99
    Connecting to host 192.168.88.99, port 5201
    [  4] local 192.168.88.60 port 50537 connected to 192.168.88.99 port 5201
    [ ID] Interval           Transfer     Bandwidth
    [  4]   0.00-1.00   sec  25.5 MBytes   213 Mbits/sec
    [  4]   1.00-2.00   sec  22.6 MBytes   190 Mbits/sec
    [  4]   2.00-3.01   sec  24.6 MBytes   205 Mbits/sec
    [  4]   3.01-4.00   sec  24.2 MBytes   204 Mbits/sec
    
    
    C:\Users\mathias.sundman\Downloads\iperf-3.1.3-win64>iperf3 -c 192.168.88.99 -R
    Connecting to host 192.168.88.99, port 5201
    Reverse mode, remote host 192.168.88.99 is sending
    [  4] local 192.168.88.60 port 49826 connected to 192.168.88.99 port 5201
    [ ID] Interval           Transfer     Bandwidth
    [  4]   0.00-1.00   sec  27.3 MBytes   229 Mbits/sec
    [  4]   1.00-2.00   sec  29.4 MBytes   247 Mbits/sec
    [  4]   2.00-3.00   sec  29.7 MBytes   249 Mbits/sec
    [  4]   3.00-4.00   sec  29.3 MBytes   246 Mbits/sec

    The client-table while everything is good looks like this:

    a8:bd:27:c0:44:fc# show clients
    
    Client List
    -----------
    Name             IP Address     MAC Address        OS       ESSID       Access Point       Channel  Type  Role        IPv6 Address               Signal    Speed (mbps)
    ----             ----------     -----------        --       -----       ------------       -------  ----  ----        ------------               ------    ------------
    IT-089           192.168.88.60  a4:34:d9:63:f9:0c  Windows  MathiasPSK  a8:bd:27:c0:44:fc  52+      AC    MathiasPSK  fe80::7455:924e:6276:4868  62(good)  400(good)
    Mathiass-MBP     192.168.88.80  f4:0f:24:19:b0:f9  Win XP   MathiasPSK  a8:bd:27:c0:44:fc  52+      AC    MathiasPSK  --                         47(good)  600(good)
    
    a8:bd:27:c0:44:fc# show ap debug client-table
    
    Client Table
    ------------
    MAC                ESSID       BSSID              Assoc_State  HT_State    AID  PS_State    UAPSD            Tx_Pkts  Rx_Pkts  PS_Qlen  Tx_Retries  Tx_Rate  Rx_Rate  Last_ACK_SNR  Last_Rx_SNR  TX_Chains  Tx_Timestamp              Rx_Timestamp              MFP Status (C,R)  Idle time  Client health (C/R)  Tx_Bytes   Rx_Bytes
    ---                -----       -----              -----------  --------    ---  --------    -----            -------  -------  -------  ----------  -------  -------  ------------  -----------  ---------  ------------              ------------              ----------------  ---------  -------------------  --------   --------
    f4:0f:24:19:b0:f9  MathiasPSK  a8:bd:27:84:4f:d0  Associated   AWvSsEeBbM  0x1  Power-save  (0,0,0,0,N/A,0)  243115   205125   0        23          600      48       48            51           4[0xf]     Thu Apr 13 22:03:43 2017  Thu Apr 13 22:03:43 2017  (0,0)             29         100/93               480550541  529986917
    a4:34:d9:63:f9:0c  MathiasPSK  a8:bd:27:84:4f:d0  Associated   AWvSsEBbM   0x2  Awake       (0,0,0,0,N/A,0)  39       219      0        0           400      400      67            55           4[0xf]     Thu Apr 13 22:04:06 2017  Thu Apr 13 22:04:06 2017  (0,0)             6          100/93               15574      25012

    A little later, most likely after I had joined also my iPhone to the network, my laptop "IT-089" suddenly can only transmit about 40Mbps. It can still receive +200M though:

     

    C:\Users\mathias.sundman\Downloads\iperf-3.1.3-win64>iperf3 -c 192.168.88.99
    Connecting to host 192.168.88.99, port 5201
    [  4] local 192.168.88.60 port 50684 connected to 192.168.88.99 port 5201
    [ ID] Interval           Transfer     Bandwidth
    [  4]   0.00-1.00   sec  4.88 MBytes  40.8 Mbits/sec
    [  4]   1.00-2.00   sec  5.12 MBytes  43.0 Mbits/sec
    [  4]   2.00-3.00   sec  5.12 MBytes  43.0 Mbits/sec
    [  4]   3.00-4.00   sec  5.12 MBytes  43.0 Mbits/sec
    
    C:\Users\mathias.sundman\Downloads\iperf-3.1.3-win64>iperf3 -c 192.168.88.99 -R
    Connecting to host 192.168.88.99, port 5201
    Reverse mode, remote host 192.168.88.99 is sending
    [  4] local 192.168.88.60 port 50754 connected to 192.168.88.99 port 5201
    [ ID] Interval           Transfer     Bandwidth
    [  4]   0.00-1.00   sec  24.0 MBytes   201 Mbits/sec
    [  4]   1.00-2.00   sec  25.5 MBytes   214 Mbits/sec
    [  4]   2.00-3.00   sec  25.3 MBytes   213 Mbits/sec
    [  4]   3.00-4.00   sec  24.9 MBytes   208 Mbits/sec

    Examining the client-table again still says that IT-089 has a Tx_Rate and Rx_Rate of 400:

     

    a8:bd:27:c0:44:fc# show clients
    
    Client List
    -----------
    Name             IP Address     MAC Address        OS       ESSID       Access Point       Channel  Type  Role        IPv6 Address               Signal    Speed (mbps)
    ----             ----------     -----------        --       -----       ------------       -------  ----  ----        ------------               ------    ------------
    Mathiass-iPhone  192.168.88.77  b4:8b:19:db:38:b1           MathiasPSK  a8:bd:27:c0:44:fc  52+      AC    MathiasPSK  --                         56(good)  360(good)
    SELT-0046        192.168.88.59  e8:b1:fc:08:39:41  Win 7    MathiasPSK  a8:bd:27:c0:44:fc  52+      AC    MathiasPSK  fe80::a06e:2ac6:fdaa:9aec  56(good)  400(good)
    IT-089           192.168.88.60  a4:34:d9:63:f9:0c  Windows  MathiasPSK  a8:bd:27:c0:44:fc  52+      AC    MathiasPSK  fe80::7455:924e:6276:4868  62(good)  400(good)
    Mathiass-MBP     192.168.88.80  f4:0f:24:19:b0:f9  Win XP   MathiasPSK  a8:bd:27:c0:44:fc  52+      AC    MathiasPSK  --                         47(good)  600(good)
    Number of Clients   :4
    Info timestamp      :6737
    
    
    a8:bd:27:c0:44:fc# show ap association
    
    The phy column shows client's operational capabilities for current association
    
    Flags: A: Active, B: Band Steerable, H: Hotspot(802.11u) client, K: 802.11K client, M: Mu beam formee, R: 802.11R client, W: WMM client, w: 802.11w client V: 802.11v BSS trans capable
    
    PHY Details: HT   : High throughput;      20: 20MHz;  40: 40MHz; t: turbo-rates (256-QAM)
                 VHT  : Very High throughput; 80: 80MHz; 160: 160MHz; 80p80: 80MHz + 80MHz
                 <n>ss: <n> spatial streams
    
    Association Table
    -----------------
    Name               bssid              mac                auth  assoc  aid  l-int  essid       vlan-id  tunnel-id  phy              assoc. time  num assoc  Flags  DataReady
    ----               -----              ---                ----  -----  ---  -----  -----       -------  ---------  ---              -----------  ---------  -----  ---------
    a8:bd:27:c0:44:fc  a8:bd:27:84:4f:d0  b4:8b:19:db:38:b1  y     y      4    20     MathiasPSK  1        0x0        a-VHT-40sgi-2ss  1m:53s       1          WV     Yes (Implicit)
    a8:bd:27:c0:44:fc  a8:bd:27:84:4f:d0  e8:b1:fc:08:39:41  y     y      3    250    MathiasPSK  1        0x0        a-VHT-40sgi-2ss  23s          1          W      Yes (Implicit)
    a8:bd:27:c0:44:fc  a8:bd:27:84:4f:d0  a4:34:d9:63:f9:0c  y     y      2    250    MathiasPSK  1        0x0        a-VHT-40sgi-2ss  14s          1          W      Yes (Implicit)
    a8:bd:27:c0:44:fc  a8:bd:27:84:4f:d0  f4:0f:24:19:b0:f9  y     y      1    10     MathiasPSK  1        0x0        a-VHT-40sgi-3ss  53m:54s      1          W      Yes (Implicit)
    Num Clients:4
    a8:bd:27:c0:44:fc# show ap debug client-table
    
    Client Table
    ------------
    MAC                ESSID       BSSID              Assoc_State  HT_State    AID  PS_State    UAPSD            Tx_Pkts  Rx_Pkts  PS_Qlen  Tx_Retries  Tx_Rate  Rx_Rate  Last_ACK_SNR  Last_Rx_SNR  TX_Chains  Tx_Timestamp              Rx_Timestamp              MFP Status (C,R)  Idle time  Client health (C/R)  Tx_Bytes   Rx_Bytes
    ---                -----       -----              -----------  --------    ---  --------    -----            -------  -------  -------  ----------  -------  -------  ------------  -----------  ---------  ------------              ------------              ----------------  ---------  -------------------  --------   --------
    f4:0f:24:19:b0:f9  MathiasPSK  a8:bd:27:84:4f:d0  Associated   AWvSsEeBbM  0x1  Power-save  (0,0,0,0,N/A,0)  245028   209414   0        26          600      540      48            47           4[0xf]     Thu Apr 13 22:29:57 2017  Thu Apr 13 22:29:57 2017  (0,0)             7          97/94                481564301  530766840
    b4:8b:19:db:38:b1  MathiasPSK  a8:bd:27:84:4f:d0  Associated   AWvSsEe     0x4  Power-save  (0,0,0,0,N/A,0)  26       347      0        0           360      360      57            56           4[0xf]     Thu Apr 13 22:29:58 2017  Thu Apr 13 22:29:58 2017  (0,0)             6          100/94               4622       3713
    e8:b1:fc:08:39:41  MathiasPSK  a8:bd:27:84:4f:d0  Associated   WvSsEbM     0x3  Awake       (1,1,1,1,2,0)    66       140      0        0           400      270      61            56           4[0xf]     Thu Apr 13 22:29:59 2017  Thu Apr 13 22:30:02 2017  (0,0)             2          100/94               21497      55268
    a4:34:d9:63:f9:0c  MathiasPSK  a8:bd:27:84:4f:d0  Associated   AWvSsEBbM   0x2  Awake       (0,0,0,0,N/A,0)  10813    37469    0        231         400      400      77            63           4[0xf]     Thu Apr 13 22:30:02 2017  Thu Apr 13 22:30:03 2017  (0,0)             1          90/94                1692430    56257386

    How can I further diagnose what is capping the traffic?

     

    PS: During the day while troublshooting this behaviour in the customers network the problem moved between different clients (HP Latops, Lenovo laptops, Linux Laptops, Windows 10 laptops) and sometimes it was TX that was capped and sometimes it was RX.

     

    PS: I posted this in another thread yesterday, but for some reason I can't see in on the board today. Appolgize if it will now appear twice.



  • 2.  RE: Client throughput capped at 40M instead of +200M occationally

    Posted Apr 14, 2017 07:27 AM

    While performing the same tests today, the problem seemed to start when another laptop with not-so-good signal joined the network. My test-laptop (which is localted 20cm from the AP) was then suddenly capped to 40M upload again.

     

    Even after disconnecting all other client, and using "disconnect-user mac xxx" to clean the associations in the AP, my laptop could still only push 40M.

     

    Turning off the clients wifi-adater and on again did not resolve the issue.

     

    After rebooting the client, it was back to normal again.

     

    I've noticed that during the problem when I try to upload from the client, the Tx_Retries value increases quickly:

     

    a8:bd:27:c0:44:fc# show ap debug client-table
    
    Client Table
    ------------
    MAC                ESSID       BSSID              Assoc_State  HT_State   AID  PS_State  UAPSD            Tx_Pkts  Rx_Pkts  PS_Qlen  Tx_Retries  Tx_Rate  Rx_Rate  Last_ACK_SNR  Last_Rx_SNR  TX_Chains  Tx_Timestamp              Rx_Timestamp              MFP Status (C,R)  Idle time  Client health (C/R)  Tx_Bytes  Rx_Bytes
    ---                -----       -----              -----------  --------   ---  --------  -----            -------  -------  -------  ----------  -------  -------  ------------  -----------  ---------  ------------              ------------              ----------------  ---------  -------------------  --------  --------
    a4:34:d9:63:f9:0c  MathiasPSK  a8:bd:27:84:4f:d0  Associated   AWvSsEBbM  0x1  Awake     (0,0,0,0,N/A,0)  173611   610345   0        4193        400      400      57            43           4[0xf]     Fri Apr 14 12:56:55 2017  Fri Apr 14 12:56:55 2017  (0,0)             0          92/92                18447895  926498265
    
    
    a8:bd:27:c0:44:fc# show ap debug client-table
    
    Client Table
    ------------
    MAC                ESSID       BSSID              Assoc_State  HT_State   AID  PS_State  UAPSD            Tx_Pkts  Rx_Pkts  PS_Qlen  Tx_Retries  Tx_Rate  Rx_Rate  Last_ACK_SNR  Last_Rx_SNR  TX_Chains  Tx_Timestamp              Rx_Timestamp              MFP Status (C,R)  Idle time  Client health (C/R)  Tx_Bytes  Rx_Bytes
    ---                -----       -----              -----------  --------   ---  --------  -----            -------  -------  -------  ----------  -------  -------  ------------  -----------  ---------  ------------              ------------              ----------------  ---------  -------------------  --------  --------
    a4:34:d9:63:f9:0c  MathiasPSK  a8:bd:27:84:4f:d0  Associated   AWvSsEBbM  0x1  Awake     (0,0,0,0,N/A,0)  222316   782369   0        5116        400      400      58            47           4[0xf]     Fri Apr 14 12:57:43 2017  Fri Apr 14 12:57:43 2017  (0,0)             0          92/92                23476096  1187675233

    Which can also be seen on this graph the IntantUI:

    wifi_retransmissions.png

     



  • 3.  RE: Client throughput capped at 40M instead of +200M occationally

    Posted Apr 14, 2017 07:31 AM

    I don't know the exact specifics of your situation, but in general, adding a client that is further away degrades a connection.  The client that is near also will de-rate or lower its speed to attempt to connect better.  It could take some time for that client to up-rate or send frames at higher rates, after.  This all depends on the client and it can vary greatly.

     

    A packet capture, looking at the rates as a column might confirm this...



  • 4.  RE: Client throughput capped at 40M instead of +200M occationally

    Posted Apr 14, 2017 07:41 AM

    Thanks for your reply,

     

    But if that's the case, shouldn't I see that my close-by client has lowered his Tx or Rx bitrate in the "show ap debug client-table" output?

     

    It still says my client is using 400Mbps. I also confirmed with show ap debug client-stats that it is only the 450Mbps counters that are increasing:

     

    a8:bd:27:c0:44:fc#  show ap debug client-stats a4:34:d9:63:f9:0c a8:bd:27:84:4f:d0 | inc "Rx Data Bytes"
    Rx Data Bytes                    294732358
    Rx Data Bytes   12 Mbps  (Mon)   300
    Rx Data Bytes   54 Mbps  (Mon)   25800
    Rx Data Bytes  108 Mbps  (Mon)   0
    Rx Data Bytes  300 Mbps  (Mon)   20518
    Rx Data Bytes  450 Mbps  (Mon)   294686040
    Rx Data Bytes  1300 Mbps  (Mon)  0
    Rx Data Bytes  1300 Mbps+ (Mon)  0
    a8:bd:27:c0:44:fc#  show ap debug client-stats a4:34:d9:63:f9:0c a8:bd:27:84:4f:d0 | inc "Rx Data Bytes"
    Rx Data Bytes                    299091164
    Rx Data Bytes   12 Mbps  (Mon)   300
    Rx Data Bytes   54 Mbps  (Mon)   25800
    Rx Data Bytes  108 Mbps  (Mon)   0
    Rx Data Bytes  300 Mbps  (Mon)   20518
    Rx Data Bytes  450 Mbps  (Mon)   299044846
    Rx Data Bytes  1300 Mbps  (Mon)  0
    Rx Data Bytes  1300 Mbps+ (Mon)  0
    
    a8:bd:27:c0:44:fc#  show ap debug client-stats a4:34:d9:63:f9:0c a8:bd:27:84:4f:d0 | inc "Tx Data Bytes"
    Tx Data Bytes Transmitted        3608817
    Tx Data Bytes                    4346481
    Tx Data Bytes   12 Mbps  (Mon)   0
    Tx Data Bytes   54 Mbps  (Mon)   344
    Tx Data Bytes  108 Mbps  (Mon)   0
    Tx Data Bytes  300 Mbps  (Mon)   2558
    Tx Data Bytes  450 Mbps  (Mon)   3605915
    Tx Data Bytes  1300 Mbps  (Mon)  0
    Tx Data Bytes  1300 Mbps+ (Mon)  0
    a8:bd:27:c0:44:fc#  show ap debug client-stats a4:34:d9:63:f9:0c a8:bd:27:84:4f:d0 | inc "Tx Data Bytes"
    Tx Data Bytes Transmitted        3739429
    Tx Data Bytes                    4497801
    Tx Data Bytes   12 Mbps  (Mon)   0
    Tx Data Bytes   54 Mbps  (Mon)   344
    Tx Data Bytes  108 Mbps  (Mon)   0
    Tx Data Bytes  300 Mbps  (Mon)   2558
    Tx Data Bytes  450 Mbps  (Mon)   3736527
    Tx Data Bytes  1300 Mbps  (Mon)  0
    Tx Data Bytes  1300 Mbps+ (Mon)  0

    Isn't that proof enough that the client is trying to use the high bitrate, but something else is blocking it causing re-transmissions?

     

    PS: Now suddently the client started to cap to 40M again without any other client joinging the network.

     

     



  • 5.  RE: Client throughput capped at 40M instead of +200M occationally

    Posted Apr 14, 2017 08:36 AM

    If you do a packet capture of both channels, and look at the "rate" column in the packet capture, that might give you an idea of what is happening.

     

    Also the "rate" in that output is not updated more than once per minute, so it does not give you a realtime idea of what is happening...



  • 6.  RE: Client throughput capped at 40M instead of +200M occationally

    Posted Apr 14, 2017 09:06 AM

    Hmm, I must be missing something.

     

    I downloaded the Aruba version of Wireshark (1.4.1) from the support site, and configured it to listen to ARUBA_ERM on port 5555.

     

    Then started pcap from the IAP:

    a8:bd:27:c0:44:fc# pcap start a8:bd:27:84:4f:d0 192.168.88.54 5555 0 5000
    
    Packet capture has started for pcap-id:3

    In wireshart I filter on "wlan" and see loads of stuff, but I'm unable to find the "Rate column" you talk about. I tried modifying the Display Columns and added "IEEE 802.11 TX rate", but that column remains empty for all packets.

     

    I also tried the latest official version of Wireshark, 2.2.6, but that didn't decode the Aruba encapsulated packets.

     

    What have I missed?



  • 7.  RE: Client throughput capped at 40M instead of +200M occationally

    Posted Apr 14, 2017 10:32 AM

    Using wireshark on my MAC in monitor mode I were able to capture what's going on in the air.

     

    AFAICS my client is both sending and receiving the actual data frames (+1500bytes) using a data rate of 400M. Wireshark doesn't show any rate in the column for those packes though, but looking inside them shows a headers saying Data Rate: 400M:

     

    Screen Shot 2017-04-14 at 16.13.58.png

     

    Screen Shot 2017-04-14 at 16.14.25.png



  • 8.  RE: Client throughput capped at 40M instead of +200M occationally

    Posted Apr 14, 2017 10:44 AM

    You are only seeing the rates seen in management frames.  You should right-click on the line with 400.0 mb/s and left click "Apply as Column"



  • 9.  RE: Client throughput capped at 40M instead of +200M occationally

    Posted Apr 14, 2017 10:56 AM

    Ahh, cool - thanks.

     

    So I guess that means we have now concluded that my client *is* sending the data with a rate of 400M so there must be something else capping the traffic, right?

     

    Screen Shot 2017-04-14 at 16.48.21.png



  • 10.  RE: Client throughput capped at 40M instead of +200M occationally

    Posted Apr 14, 2017 11:01 AM

    How many of those frames are retries?

     



  • 11.  RE: Client throughput capped at 40M instead of +200M occationally

    Posted Apr 14, 2017 11:22 AM

    Of all the QoS Data frames:

    Upload 1164/16970 (6,9%) are retransmissions.

    Download 193/4612 (4,2%) are retransmissions.

     



  • 12.  RE: Client throughput capped at 40M instead of +200M occationally

    Posted Apr 14, 2017 11:24 AM

    The 400mbps number is the "rate" and not the throughput.  I would open a tac case and send them the packet capture and tech support of when it transitions from 400 to 40 to make sense of what is being seen.



  • 13.  RE: Client throughput capped at 40M instead of +200M occationally

    Posted Apr 14, 2017 11:42 AM

    Yes, I'm not expecting to get a throughput of 400M. I'm only expecting to keep the throughput around 200M as I see after a fresh reboot of the client.

     

    I did a capture now after a fresh reboot where I can upload 200Mbps, and it looks like this:

    Screen Shot 2017-04-14 at 17.33.41.png

    But after it has dropped to 40M throughput it looks like this:

     

    Screen Shot 2017-04-14 at 17.33.12.png

     

    So when everything is good there is just a "Block Ack" every now and then, but when it missbehaves there is an "Acknowledge" frame sent back to the client after every single frame.

     

    Does that give any clues?

     

    I'll go ahead and open a TAC case as well now - Big thanks for your help so far.

     



  • 14.  RE: Client throughput capped at 40M instead of +200M occationally

    Posted Apr 14, 2017 12:03 PM

    Pcaps attached for the curius. 

     

    iap-send-good100 are 100 packets from a newly rebooted client working fine (uploading 200Mbps+)

     

    iap-send-100 are 100 packets from the same client a little later when it's only able to upload 40Mbps



  • 15.  RE: Client throughput capped at 40M instead of +200M occationally

    Posted Apr 14, 2017 12:51 PM

    Found another interesting fact. According to wireshark on my client laptop, I'm sending TCP packets as 1514 byte frames in both cases.

     

    According to Wireshark on my MAC, passively sniffing the air in monitor mode, these frames are sent as 1606 bytes frames when it's working good, and 1610 bytes frames when not working so good.

     

    Now to the really interesting part. On the desktop running the iperf server, if I tcpdump the traffic there, in the non-working case (40Mbps), I see all the frames as standard 1514 byte frames:

     

    18:10:32.815832 a4:34:d9:63:f9:0c > d4:be:d9:a3:bf:d3, ethertype IPv4 (0x0800), length 1514: 192.168.88.53.51340 > 192.168.88.99.5201: Flags [.], seq 9808688:9810148, ack 1, win 53248, length 1460
    18:10:32.816144 a4:34:d9:63:f9:0c > d4:be:d9:a3:bf:d3, ethertype IPv4 (0x0800), length 1514: 192.168.88.53.51340 > 192.168.88.99.5201: Flags [.], seq 9810148:9811608, ack 1, win 53248, length 1460
    18:10:32.816149 d4:be:d9:a3:bf:d3 > a4:34:d9:63:f9:0c, ethertype IPv4 (0x0800), length 54: 192.168.88.99.5201 > 192.168.88.53.51340: Flags [.], ack 9811608, win 936, length 0
    18:10:32.816644 a4:34:d9:63:f9:0c > d4:be:d9:a3:bf:d3, ethertype IPv4 (0x0800), length 1514: 192.168.88.53.51340 > 192.168.88.99.5201: Flags [.], seq 9811608:9813068, ack 1, win 53248, length 1460
    18:10:32.817615 a4:34:d9:63:f9:0c > d4:be:d9:a3:bf:d3, ethertype IPv4 (0x0800), length 1514: 192.168.88.53.51340 > 192.168.88.99.5201: Flags [.], seq 9813068:9814528, ack 1, win 53248, length 1460
    18:10:32.817623 d4:be:d9:a3:bf:d3 > a4:34:d9:63:f9:0c, ethertype IPv4 (0x0800), length 54: 192.168.88.99.5201 > 192.168.88.53.51340: Flags [.], ack 9814528, win 936, length 0
    18:10:32.817874 a4:34:d9:63:f9:0c > d4:be:d9:a3:bf:d3, ethertype IPv4 (0x0800), length 1514: 192.168.88.53.51340 > 192.168.88.99.5201: Flags [.], seq 9814528:9815988, ack 1, win 53248, length 1460
    18:10:32.818421 a4:34:d9:63:f9:0c > d4:be:d9:a3:bf:d3, ethertype IPv4 (0x0800), length 1514: 192.168.88.53.51340 > 192.168.88.99.5201: Flags [.], seq 9815988:9817448, ack 1, win 53248, length 1460
    18:10:32.818426 d4:be:d9:a3:bf:d3 > a4:34:d9:63:f9:0c, ethertype IPv4 (0x0800), length 54: 192.168.88.99.5201 > 192.168.88.53.51340: Flags [.], ack 9817448, win 936, length 0
    18:10:32.818788 a4:34:d9:63:f9:0c > d4:be:d9:a3:bf:d3, ethertype IPv4 (0x0800), length 1514: 192.168.88.53.51340 > 192.168.88.99.5201: Flags [.], seq 9817448:9818908, ack 1, win 53248, length 1460
    18:10:32.819111 a4:34:d9:63:f9:0c > d4:be:d9:a3:bf:d3, ethertype IPv4 (0x0800), length 1514: 192.168.88.53.51340 > 192.168.88.99.5201: Flags [.], seq 9818908:9820368, ack 1, win 53248, length 1460
    18:10:32.819116 d4:be:d9:a3:bf:d3 > a4:34:d9:63:f9:0c, ethertype IPv4 (0x0800), length 54: 192.168.88.99.5201 > 192.168.88.53.51340: Flags [.], ack 9820368, win 936, length 0
    18:10:32.819373 a4:34:d9:63:f9:0c > d4:be:d9:a3:bf:d3, ethertype IPv4 (0x0800), length 1514: 192.168.88.53.51340 > 192.168.88.99.5201: Flags [.], seq 9820368:9821828, ack 1, win 53248, length 1460
    18:10:32.819838 a4:34:d9:63:f9:0c > d4:be:d9:a3:bf:d3, ethertype IPv4 (0x0800), length 1514: 192.168.88.53.51340 > 192.168.88.99.5201: Flags [.], seq 9821828:9823288, ack 1, win 53248, length 1460

     

    However, when it's working good, I receive them "jumbo-frames":

     

    18:16:12.632084 a4:34:d9:63:f9:0c > d4:be:d9:a3:bf:d3, ethertype IPv4 (0x0800), length 4434: 192.168.88.53.49771 > 192.168.88.99.5201: Flags [.], seq 55102472:55106852, ack 1, win 53248, length 4380
    18:16:12.632091 d4:be:d9:a3:bf:d3 > a4:34:d9:63:f9:0c, ethertype IPv4 (0x0800), length 54: 192.168.88.99.5201 > 192.168.88.53.49771: Flags [.], ack 55106852, win 3709, length 0
    18:16:12.632135 a4:34:d9:63:f9:0c > d4:be:d9:a3:bf:d3, ethertype IPv4 (0x0800), length 5894: 192.168.88.53.49771 > 192.168.88.99.5201: Flags [.], seq 55106852:55112692, ack 1, win 53248, length 5840
    18:16:12.632141 d4:be:d9:a3:bf:d3 > a4:34:d9:63:f9:0c, ethertype IPv4 (0x0800), length 54: 192.168.88.99.5201 > 192.168.88.53.49771: Flags [.], ack 55112692, win 3709, length 0
    18:16:12.632185 a4:34:d9:63:f9:0c > d4:be:d9:a3:bf:d3, ethertype IPv4 (0x0800), length 5894: 192.168.88.53.49771 > 192.168.88.99.5201: Flags [.], seq 55112692:55118532, ack 1, win 53248, length 5840
    18:16:12.632192 d4:be:d9:a3:bf:d3 > a4:34:d9:63:f9:0c, ethertype IPv4 (0x0800), length 54: 192.168.88.99.5201 > 192.168.88.53.49771: Flags [.], ack 55118532, win 3709, length 0
    18:16:12.632236 a4:34:d9:63:f9:0c > d4:be:d9:a3:bf:d3, ethertype IPv4 (0x0800), length 5894: 192.168.88.53.49771 > 192.168.88.99.5201: Flags [.], seq 55118532:55124372, ack 1, win 53248, length 5840
    18:16:12.632242 d4:be:d9:a3:bf:d3 > a4:34:d9:63:f9:0c, ethertype IPv4 (0x0800), length 54: 192.168.88.99.5201 > 192.168.88.53.49771: Flags [.], ack 55124372, win 3709, length 0
    18:16:12.632285 a4:34:d9:63:f9:0c > d4:be:d9:a3:bf:d3, ethertype IPv4 (0x0800), length 5894: 192.168.88.53.49771 > 192.168.88.99.5201: Flags [.], seq 55124372:55130212, ack 1, win 53248, length 5840
    18:16:12.632292 d4:be:d9:a3:bf:d3 > a4:34:d9:63:f9:0c, ethertype IPv4 (0x0800), length 54: 192.168.88.99.5201 > 192.168.88.53.49771: Flags [.], ack 55130212, win 3709, length 0
    18:16:12.632339 a4:34:d9:63:f9:0c > d4:be:d9:a3:bf:d3, ethertype IPv4 (0x0800), length 1514: 192.168.88.53.49771 > 192.168.88.99.5201: Flags [.], seq 55130212:55131672, ack 1, win 53248, length 1460
    18:16:12.632528 a4:34:d9:63:f9:0c > d4:be:d9:a3:bf:d3, ethertype IPv4 (0x0800), length 4434: 192.168.88.53.49771 > 192.168.88.99.5201: Flags [.], seq 55131672:55136052, ack 1, win 53248, length 4380
    18:16:12.632534 d4:be:d9:a3:bf:d3 > a4:34:d9:63:f9:0c, ethertype IPv4 (0x0800), length 54: 192.168.88.99.5201 > 192.168.88.53.49771: Flags [.], ack 55136052, win 3709, length 0
    18:16:12.632580 a4:34:d9:63:f9:0c > d4:be:d9:a3:bf:d3, ethertype IPv4 (0x0800), length 5894: 192.168.88.53.49771 > 192.168.88.99.5201: Flags [.], seq 55136052:55141892, ack 1, win 53248, length 5840
    18:16:12.632587 d4:be:d9:a3:bf:d3 > a4:34:d9:63:f9:0c, ethertype IPv4 (0x0800), length 54: 192.168.88.99.5201 > 192.168.88.53.49771: Flags [.], ack 55141892, win 3709, length 0
    18:16:12.632630 a4:34:d9:63:f9:0c > d4:be:d9:a3:bf:d3, ethertype IPv4 (0x0800), length 5894: 192.168.88.53.49771 > 192.168.88.99.5201: Flags [.], seq 55141892:55147732, ack 1, win 53248, length 5840
    18:16:12.632636 d4:be:d9:a3:bf:d3 > a4:34:d9:63:f9:0c, ethertype IPv4 (0x0800), length 54: 192.168.88.99.5201 > 192.168.88.53.49771: Flags [.], ack 55147732, win 3709, length 0
    18:16:12.632680 a4:34:d9:63:f9:0c > d4:be:d9:a3:bf:d3, ethertype IPv4 (0x0800), length 5894: 192.168.88.53.49771 > 192.168.88.99.5201: Flags [.], seq 55147732:55153572, ack 1, win 53248, length 5840
    18:16:12.632686 d4:be:d9:a3:bf:d3 > a4:34:d9:63:f9:0c, ethertype IPv4 (0x0800), length 54: 192.168.88.99.5201 > 192.168.88.53.49771: Flags [.], ack 55153572, win 3709, length 0
    18:16:12.632729 a4:34:d9:63:f9:0c > d4:be:d9:a3:bf:d3, ethertype IPv4 (0x0800), length 2974: 192.168.88.53.49771 > 192.168.88.99.5201: Flags [.], seq 55153572:55156492, ack 1, win 53248, length 2920
    18:16:12.632735 d4:be:d9:a3:bf:d3 > a4:34:d9:63:f9:0c, ethertype IPv4 (0x0800), length 54: 192.168.88.99.5201 > 192.168.88.53.49771: Flags [.], ack 55156492, win 3709, length 0
    18:16:12.632779 a4:34:d9:63:f9:0c > d4:be:d9:a3:bf:d3, ethertype IPv4 (0x0800), length 4434: 192.168.88.53.49771 > 192.168.88.99.5201: Flags [.], seq 55156492:55160872, ack 1, win 53248, length 4380

     

     



  • 16.  RE: Client throughput capped at 40M instead of +200M occationally

    Posted Apr 14, 2017 03:49 PM

    As far as i can see after comparing two pcaps is.

     

    iap-send-100 has a single ack (NOT block ack) from IAP to client, sent at 24 Mbps every other frame and iap-send-good100 has a Block ack sent at 24 Mbps after 30 odd frames.

     

    What made the IAP send block acks in one situation and not in other? A show tech might be able to tell, which explains client capabilities.

     

    Kindly share one for non working and working scenario, like you did for pcaps.

     

    Plz also point the mac of good client ( i assume a4:34:d9:63:f9:0c) and bad client (after introducing which throughput drops down). If you have already the shared this info, my apologies,  please share it again once with mac address info.



  • 17.  RE: Client throughput capped at 40M instead of +200M occationally

    Posted Apr 14, 2017 03:59 PM

    I've come to the same conclusion and am just reading up on frame aggregation (A-MPDU & A-MSDU).

     

    Attached are the two show-techs.

     

    Correct, a4:34:d9:63:f9:0c is the client I'm testing with, both in the working and non-working cases. After I reboot this client it "works" and then a few minutes later it starts misbehaving.

     

    I doubt it can only be that it stops performing frame aggregation though. I think it's more likely something that causing it push back on several different capabilities and frame aggregation is just one that we can easily identify.

     

    I base that conclusion on the fact that sometimes the client locks at a throughput of only 20M, sometime about 60M, but most times at 40M (pratical TCP throughputs, not bitrates), while frame aggregation is an on/off capability, and I also double that it alone should decrease the throughput from 200M -> 40M.

     

    But if we can figure out why this is being pushed back, we might understand the rest as well...

     

    Attachment(s)

    txt
    iap-working-showtech.txt   522 K 1 version


  • 18.  RE: Client throughput capped at 40M instead of +200M occationally

    Posted Apr 14, 2017 04:20 PM

    What is the exact change from working scenario to non working scenario. Just associating macbook client f4:0f:24:19:b0:f9 (HT state AWvSsEBbM), after X minutes, brings down throughput of a4:34:d9:63:f9:0c (HT state  AWvSsEeBbM) from 200 to 40?

     

    Power state awake / power save has no effect on results. Right?

     

    HT state Legend:

     

    UAPSD:(VO,VI,BK,BE,Max SP,Q Len)
    HT Flags: A - LDPC Coding; W - 40MHz; S - Short GI 40; s - Short GI 20
    D - Delayed BA; G - Greenfield; R - Dynamic SM PS
    Q - Static SM PS; N - A-MPDU disabled; B - TX STBC
    b - RX STBC; M - Max A-MSDU; I - HT40 Intolerant; t turbo-rates (256-QAM)
    VHT Flags: C - 160MHz/80+80MHz; c - 80MHz; V - Short GI 160; v - Short GI 80
    E - Beamformee; e - Beamformer
    HT_State shows client's original capabilities (not operational capabilities)

     

    You are on 4.3.0.0, are you open to trying 4.3.1.3, since its your lab? You can always go back to same software by switch-partition-reboot.

     

     



  • 19.  RE: Client throughput capped at 40M instead of +200M occationally

    Posted Apr 14, 2017 04:35 PM

    I haven't been able to isolate what triggers the degradation. At first I though it was when I connected other clients, sometimes my iPhone, sometimes another client etc. But my last 5 reboot attempt I've done nothing (on purpose) other than waiting a few minutes.

     

    I'll try to shut down all other clients to see if it still happens.

     

    Regarding Power Save, I've set Windows Power Options for Wifi to "Maximum Performance" both on AC and DC. That made quite some difference when running on DC, but all tests I do now is on AC.

     

    I can't see that the capabilites (HT_State) changes for my specific client between working / non-working.

     

    I'm happy to try new versions. While testing last week at my customer using IAP-305 we tested on 4.3.1.0 and 4.3.1.1 with similar results.

     

    I'll try with 4.3.1.3 (if it's available from the Aruba support site) on my IAP-315 now...



  • 20.  RE: Client throughput capped at 40M instead of +200M occationally

    Posted Apr 14, 2017 05:02 PM
      |   view attached

    I've now turned all other clients off and cleared the clients-table in the IAP. Everything worked fine with my single client, so I started thinking it was some other client triggering it.

     

    Then I just disconnected my client, and reconnected again and BOOM, problem is back...

     

    Attached a another show-tech with this single client connected.

     

    I'll now proceed with the upgrade. I don't see any obvious fixes in the release notes though, but it can't harm to test.

    Attachment(s)



  • 21.  RE: Client throughput capped at 40M instead of +200M occationally

    Posted Apr 14, 2017 05:19 PM
      |   view attached

    I've now upgraded to 4.3.1.3 and it didn't make any difference. It was even slightly slower this time :)

     

    C:\Users\mathias.sundman\Downloads\iperf-3.1.3-win64>iperf3.exe -c 192.168.88.99 -t 200
    Connecting to host 192.168.88.99, port 5201
    [  4] local 192.168.88.53 port 53029 connected to 192.168.88.99 port 5201
    [ ID] Interval           Transfer     Bandwidth
    [  4]   0.00-1.01   sec  4.25 MBytes  35.3 Mbits/sec
    [  4]   1.01-2.01   sec  4.00 MBytes  33.4 Mbits/sec
    [  4]   2.01-3.00   sec  4.00 MBytes  34.0 Mbits/sec
    [  4]   3.00-4.01   sec  4.00 MBytes  33.5 Mbits/sec

    This test also with only this single client connected. New show-tech attached.

    Attachment(s)



  • 22.  RE: Client throughput capped at 40M instead of +200M occationally

    Posted Apr 14, 2017 06:05 PM

    I hate to ask for this but can you share the working show tech as well on 4.3.1.3. i have to take these to another group of people and would like a complete set on 4.3.1.3. I think i can use older set of pcaps though.



  • 23.  RE: Client throughput capped at 40M instead of +200M occationally

    Posted Apr 14, 2017 06:22 PM
      |   view attached

    No worries - I appreciate your help. Show-tech attached.

     

     

    Attachment(s)



  • 24.  RE: Client throughput capped at 40M instead of +200M occationally

    Posted Apr 14, 2017 06:45 PM

    Hmmm.. After I upgraded the IAP, I just reconnected the same client and then had the same problem.

     

    Then I rebooted the client, and since then I haven't been able to trigger the problem again for more than 1h.

     

    Now I've even connected my Mac and iPhone again, and still don't see the problem.

     

    With a little luck the upgrade might acctually have done the trick, but the client also needed a reboot to clean up his head :)

     

    But to soon to judge... I'll keep testing and keep you updated. Just let me know if you need any more info.

     



  • 25.  RE: Client throughput capped at 40M instead of +200M occationally

    Posted Apr 14, 2017 07:26 PM

    I give up for tonight - I can't reproduce the problem any longer :)

     

    I've connected 2 more laptops and my iPhone, moved them to bad signal positions, pushed data from multiple clients, disconnected/connected, but no matter what my test-client keeps performing well!

     

    We'll see tomorrow it the happiness will last, but it has never been this stable before - that's for sure.



  • 26.  RE: Client throughput capped at 40M instead of +200M occationally

    Posted Apr 15, 2017 10:18 AM

    Unfortunately it seems like I just had a lucky time last night.

     

    Today I've been testing with 4.3.0.0, 4.3.1.1 and 4.3.1.2 on another IAP-315 (same config) and all gave the same problem. Then I switched back to the other IAP running 4.3.1.3 that worked good last night, and rebooted my client, but even on my first attempt it locked at 40M :(

     

    At least it's very binary that when it works good I can see with tcpdump on my iperf server that the client is sending large (aggregated frames, I assume), but when it's not working I see standard sized frames.

     

    So I'm gonna see if I can figure out how to manually disable A-MPDU and A-MSDU to see if the thoughput mathes the non-working scenario, or if something else is also playing us.

     

    I can also add that with firmware up to and 4.3.1.1 I saw intermittent drop-offs from my SSID, so the client switched over to another SSID for no appearent reason. So far I've not seen that with 4.3.1.2+.

     

    The only way I've been able to trigger the problem with 4.3.1.2+ is when I disconnect/connect to the network, so I'm thinking could it perhaps be a problem on the client side that when I connect to another network it learns that network don't support Frame Aggregation, and then for some reason it sticks with that even when joining my IAP-315 network.

     

    It kindof makes sence as a client reboot always resolves the problem.

     

    Attached are show-tech's from todays testing.

     



  • 27.  RE: Client throughput capped at 40M instead of +200M occationally

    Posted Apr 15, 2017 06:31 PM
      |   view attached

    Dooh, this is driving me crazy! It feels like things are much more stable with the IAP running on 4.3.1.2 code. At least I don't have any unexplained drop-offs, and I find it very hard to trigger my initial problem. It has only happend twice today.

     

    Currently I'm stuck in a "new" state, IAP running on 4.3.1.2 code, where my client can push about 80-95 Mbps.

     

    Capture of what's happening in the air is very similar to the working scenario. I can see BA (Block Ackknowledge) and A-MPDU is being used in both cases.

     

    Then just a clarification regarding the tcpdump's on my iperf server where I earlier saw very large frames and though those were the result of Frame Aggregation. That's not the case, it was just TCP Segmentation Offloading on the NIC that was fooling me. After turning that off I always see normal 1514 bytes frames there in all cases.

     

    Adding a show-tech for this new state as well.

     

    With this new state, a reboot of the client doesn't help. It's stuck! I'll probably have to reboot the IAP to clear it...

    Attachment(s)



  • 28.  RE: Client throughput capped at 40M instead of +200M occationally

    Posted Apr 16, 2017 11:25 AM

    I've now read up on A-MPDU and learned that is uses Block Ack, which needs be negotiated via ADDBA Requests in each direction. I've done new captures which include the time where the client Associates with the IAP until my iperf measurement starts.

     

    I now have three different states I'm analyzing:

     

    "Good" - where everything works fine and I can Tx and Rx 200Mbps.

    "40M" - where I can Tx about 40Mbps and Rx 200Mbps.

    "80M" - where I can Tx about 80-100Mps. Forgot to test Rx.

     

    Running on 4.3.1.3 code I'm mostly in "good" state, but have face a few situations where I've ended in 80M or 40M state.

     

    When I'm in 40M, a reboot of the client resolves the issue.

     

    When I'm in 80M, client reboot does not help, but rebooting the AP resolves the issue.

     

    Examining the new pcaps shows that in "good" or "80M" state, there is a mutual Block Ack Request/Response negotiation (Both IAP and Client Requests BA):

    Screen Shot 2017-04-16 at 13.11.57.png

     

    And I can see that the actual iperf data is sent in chunks of 52 QOS Data frames that are Block Ack'ed (Both "good" and "80M").

     

    However in the 40M state, which I easily end up in when running <=4.3.1.1 code, only the IAP is requesting BA, the client don't:

    Screen Shot 2017-04-16 at 13.05.57.png

     

    I explains why I can download with full 200Mbps, but only upload with 40Mbps when stuck in "40M" state.

     

    Googling ADDBA driver problems gave me this:

    https://github.com/kaloz/mwlwifi/issues/41

     

    Where someone reports similar problems where things works week for a period of time but after that the NIC driver fails to establish a new BA session.

     

    Given that, it feels like the 40M problem is mostly a client NIC driver problem but it doesn't explain why it works so much better when running on 4.3.1.3 code.

     

    I also lack an explanation what is causing the "80M" state, where the client *IS* using BA and transmitting using 400M bitrate but is only able to push 80Mbps.



  • 29.  RE: Client throughput capped at 40M instead of +200M occationally

    Posted Apr 16, 2017 11:27 AM

    New pcaps...



  • 30.  RE: Client throughput capped at 40M instead of +200M occationally

    Posted Apr 17, 2017 07:09 AM

    The client I've performed most tests with is a HP EliteBook 840 G3, with Intel "Dual Band Wireless-AC 8260" WiFi NIC, running Windows 10 and driver version 18.32.1.2 from 2015-12-30.

     

    I've tested to upgrade to the lastest drivers from HP (10.40.0.3P) and directly from Intel (10.50.1.5) but both just seemed to make things even worse. Using them I've never been able to transfer for than 80M, and when capturing the BA negotiation there are just a lot of re-transmissions, Add/Delete etc:

     

    Screen Shot 2017-04-17 at 12.49.45.png

     

    Please remember that I started seeing the problem at a customer site, who is using Lenovo Thinkpads, so it's not just my single client having the issue. I've also replicated the problem booting up Ubuntu on my HP Laptop.



  • 31.  RE: Client throughput capped at 40M instead of +200M occationally

    Posted Apr 17, 2017 08:54 AM
      |   view attached

    Just to iron out a few features I've disable background spectrum-monitoring, set air-time-fairness to default and disabled client-match, but am still having similar problems.

     

    Attached is a pcap with this configuration, where the client successfully negotiate BA, and uses A-PMDU to transfer data, but is only able to push about 40Mbps!

     

    This time I almost don't see any TxRetries, Client_health is 100/100, and bitrate is still 400M. Everything just looks perfect - but it's just slow! So freaking annoying :)

     

    Attachment(s)



  • 32.  RE: Client throughput capped at 40M instead of +200M occationally

    Posted Apr 17, 2017 10:53 AM

    I've now been doing some testing on 2.4GHz, and I seem to have the same problem there.

     

    At first everything looked perfect, I connected with a bitrate of 144Mbps, and could transfer 80-90Mbps both up and down, then suddenly half an hour later, I can now only upload about 40Mbps, but can still download 80Mbps.

     

    Confirmed with wireshark, BA is still being used in this case.



  • 33.  RE: Client throughput capped at 40M instead of +200M occationally

    Posted Apr 17, 2017 12:33 PM

    I just made a very interesting observation.

     

    I've shutdown all clients, rebooted the IAP running the latest code (4.3.1.3), and connected my single test-client. It was running fine for a few minutes, then the problem started and I were only able to upload about 30M, but still download 200M.

     

    But, the interesting part is that every time I execute "show tech", or "show cpu" on the IAP, the throughput increases to normal for a few seconds:

     

    [  4]  72.01-73.01  sec  3.50 MBytes  29.2 Mbits/sec
    [  4]  73.01-74.01  sec  3.25 MBytes  27.2 Mbits/sec
    [  4]  74.01-75.01  sec  3.25 MBytes  27.4 Mbits/sec
    [  4]  75.01-76.01  sec  3.50 MBytes  29.2 Mbits/sec
    [  4]  76.01-77.00  sec  3.25 MBytes  27.5 Mbits/sec
    [  4]  77.00-78.01  sec  3.50 MBytes  29.3 Mbits/sec
    [  4]  78.01-79.00  sec  3.50 MBytes  29.4 Mbits/sec
    [  4]  79.00-80.00  sec  3.25 MBytes  27.3 Mbits/sec
    [  4]  80.00-81.00  sec  3.25 MBytes  27.3 Mbits/sec
    [  4]  81.00-82.00  sec  3.50 MBytes  29.3 Mbits/sec
    [  4]  82.00-83.00  sec  24.0 MBytes   202 Mbits/sec
    [  4]  83.00-84.00  sec  26.5 MBytes   222 Mbits/sec
    [  4]  84.00-85.01  sec  26.2 MBytes   220 Mbits/sec
    [  4]  85.01-86.00  sec  25.6 MBytes   216 Mbits/sec
    [  4]  86.00-87.00  sec  27.1 MBytes   228 Mbits/sec
    [  4]  87.00-88.01  sec  15.4 MBytes   128 Mbits/sec
    [  4]  88.01-89.01  sec  3.50 MBytes  29.3 Mbits/sec
    [  4]  89.01-90.00  sec  3.50 MBytes  29.6 Mbits/sec
    [  4]  90.00-91.00  sec  3.50 MBytes  29.3 Mbits/sec
    [  4]  91.00-92.00  sec  3.25 MBytes  27.3 Mbits/sec
    [  4]  92.00-93.00  sec  3.50 MBytes  29.3 Mbits/sec
    [  4]  93.00-94.01  sec  3.25 MBytes  27.0 Mbits/sec
    [  4]  94.01-95.00  sec  3.50 MBytes  29.6 Mbits/sec
    [  4]  95.00-96.00  sec  3.50 MBytes  29.3 Mbits/sec

    Above is for a "show tech". If I do a "show cpu" it's only good for 1-2 sec.

     

    If I do something else like a "show clients" or "show log system", I see no difference.

     

    Feels like a CPU / interrupt schduling thing, where a show cpu, somehow releases the queue.

     

    Current load during the upload:

     

    a8:bd:27:c0:44:fc# show cpu details
    Mem: 172944K used, 309008K free, 0K shrd, 0K buff, 28128K cached
    Load average: 4.07 3.95 3.81  (Status: S=sleeping R=running, W=waiting)
      PID USER     STATUS   RSS  PPID %CPU %MEM COMMAND
    27590 root     SW         0     2 16.4  0.0 kworker/1:2
       34 root     RW         0     2  7.4  0.0 kworker/0:1
     2419 root     SW         0     2  1.4  0.0 power_monitor
     3443 root     S <    12520  3367  0.0  2.5 cli
     3451 root     S N     5700  3367  0.0  1.1 sapd
     3543 root     S       3936  3367  0.0  0.8 ble_relay
     9934 root     S <     3520  9933  0.0  0.7 cli
     3519 root     S       2496  3367  0.0  0.5 mdns
     3469 root     S <     2456  3367  0.0  0.5 stm
     3498 root     S       2264  3367  0.0  0.4 snmpd_sap
    ...

     

     



  • 34.  RE: Client throughput capped at 40M instead of +200M occationally

    Posted Apr 17, 2017 05:23 PM
      |   view attached

    OMG, I've just concluded that the problem actually starts when I ssh into the IAP!

     

    When I have no SSH CLI sessions open everything works fine, but as soon as I open an ssh session the thoughput drops from 200M to about 80M. If I open a second ssh session it drops to about 40M and the cli process starts using about 50% cpu:

     

    a8:bd:27:c0:44:fc# show cpu details
    Mem: 174756K used, 307196K free, 0K shrd, 0K buff, 28128K cached
    Load average: 4.17 2.90 2.60  (Status: S=sleeping R=running, W=waiting)
      PID USER     STATUS   RSS  PPID %CPU %MEM COMMAND
     5973 root     R <     3316  5972 48.8  0.6 cli
        9 root     RW         0     2  2.1  0.0 kworker/1:0
     7207 root     R <      344  6965  1.0  0.0 top
       34 root     SW         0     2  1.0  0.0 kworker/0:1
     3367 root     S       1276     1  0.5  0.2 nanny
     2420 root     SW         0     2  0.5  0.0 power_monitor
     3443 root     S <    12504  3367  0.0  2.5 cli
     3451 root     S N     5708  3367  0.0  1.1 sapd
     3543 root     S       3936  3367  0.0  0.8 ble_relay
     6965 root     S <     3304  6964  0.0  0.6 cli

    With my IAP running 6.5.1.0-4.3.1.1 I could reproduce the behaviour after a factory reset by login into the WebUI, creating a default SSID for 5GHz, PSK.

     

    Only SSH:ing to the IAP did not trigger the problem on the factory reset IAP, but after I issued:

     

    34:fc:b9:c6:6a:0c# conf t
    now support CLI commit model, please type "commit apply" for configuration to take effect.
    34:fc:b9:c6:6a:0c (config) # loginsession timeout 0
    34:fc:b9:c6:6a:0c (config) # exit
    34:fc:b9:c6:6a:0c# commit apply
    committing configuration...
    configuration committed.

    The exact same behavior started.

     

    It just sounds far to easy and I just can't believe I've spent the whole weekend throublshooting this without realizing this relation earlier if there is not more behind it causing it not always to behave like this.

     

    Attaching a show tech from the newly factory reset:ed IAP just after the problem started again.

    Attachment(s)



  • 35.  RE: Client throughput capped at 40M instead of +200M occationally

    Posted Apr 18, 2017 04:13 PM
      |   view attached

    I was informed this *might* be fixed in 6.5.2.0, so I've upgraded one of my IAP-315 to that, and yeah, so far it looks promising...

     

    I can no longer reproduce the "ssh cli" high cpu problem and see no drops in my throughput.

     

    I have had my client enter the "40m" state though where I could only upload 40M, but that time I could see BA/A-MPDU was not beeing used, so that I blame my client NIC drivers. After a client reboot it was back to normal again.

     

    Attached is a show-tech while performing the iperf measurement uploading about 200Mbps.

     

    I'll keep my fingers crossed and give it a few days of testing.

     

    PS: I also tested 4.3.1.3 on an IAP-205 and it suffered from the same "ssh cli" bug. Only difference was that it was even worse :) From 200M -> 2m with a single ssh session opened.

     

    Attachment(s)



  • 36.  RE: Client throughput capped at 40M instead of +200M occationally
    Best Answer

    Posted Apr 18, 2017 04:54 PM

    I can also add that I've tested to disable A-MPDU in the IAP with:

     

    (config) #  wlan ssid-profile Test
    (SSID Profile "Test") # mpdu-agg-disable

    and can confirm that the TCP throughput drops from 200M -> 40M with only that change.

     

    So, to conclude, it looks like I've been fighting two bugs at the same time:

     

    1) The client NIC (Intel AC-8260) fails to negotiate A-MPDU after some time when running on driver v18.32.1.2. After upgrading to the latest drivers from Intel (19.50.1.5), the problem is resolved. For some reason my first upgrade attempt were non-successful and caused a lot of retransmissions and even worse A-MPDU negotiations. 

     

    2) "SSH CLI - loginsession timeout 0" bug in at least IAP 4.3.1.1-4.3.1.3 causing high CPU load and decreased throughput (200->80M for 1 ssh session). Confirmed by TAC as bug id #156250. When "loginsession timeout 0" is configured it causes high CPU load when SSH is connected. Fixed in 6.5.2.0. Problem observed on IAP-205 and IAP-315.

     

    Work-around: Disable loginsession timeout (which is the default) with:

    no loginsession timeout

     



  • 37.  RE: Client throughput capped at 40M instead of +200M occationally

    Posted Apr 20, 2017 03:02 PM

    Today I did a new attempt to upgrade the Intel drivers on the client to the latest available (19.50.1.5) and this time it actually seems to have fixed the issue!

     

    Last time I tried to upgrade I ended up in some strange state where the client was never able to negotiate Block Ack (A-MPDU) and there was a lot retransmission etc.

     

    I suspect the new driver never got properly updated (Even though it listed the new version when looking at the NIC properties). This time I first uninstalled the whole Inter PROset package before installing the new one. Perhaps that made the difference.

     

    Anyway, for the last 5 hours everything has worked perfect, so I'm keeping my fingers crossed.