Push It To The Limit! Understand Wi-Fi’s Breaking Point to Design Better WLANs

By revolutionwifi posted Jun 30, 2015 08:35 AM

Kudos

This is the fourth and final blog post in the WLAN capacity planning series. Be sure to read the first, second, and third posts.

We all want high performing WLANs. In order to do that we must push Wi-Fi to its limits!

(Cue adapted Scarface Theme, verse 1)…

Push it to the limit!

Walk along the perimeter edge

But don’t look up, just keep your head

And you’ll be finished

Survey to the limit!

Past the point of no bandwidth

You’ve reached the edge but still you gotta learn

How to build it

Hit the floor and double your pace

Laptop wide open like an engineer outta hell

And you crush the speed test

Going for the back of every room

Nothing gonna stop you

There’s no wall that strong

So close now, battery near the brink

So, push it!

We walk a fine line when designing wireless networks, attempting to push as many users and bandwidth through our APs as possible, ensuring adequate capacity is available to meet demand, while not overbuilding the network. But what are the limits and how do we know we’ve hit them? Or more importantly, how do we plan and design Wi-Fi networks to make sure we don’t hit these limits?

The Right Metrics

Capacity can be defined as the amount of available time for stations to transmit and receive data. Or to put it another way, available capacity is the inverse of network utilization. The question then becomes, how do me measure network utilization?

In order to design, measure and assess the performance of a WLAN, we need to understand the key metrics that define WLAN health. In wired and wireless networks alike the key metrics are bandwidth (more precisely ‘throughput’, which I’ll use from here on out) and latency. If you’ll recall from my earlier post, throughput is really a function of serialization delay, while latency is a function of both geographic delay and contention delay.

For the purposes of discussing latency in the remainder of this article, I’ll limit my working definition to contention delay at the access layer since most Wi-Fi travels over distances of only a few tens of meters with RF signals travelling near the speed of light, which renders geographic delay negligible.

However there is a key difference between wired and wireless networks that must be considered. In wired networks both throughput and latency correspond directly to link utilization; a wired link operates at full-duplex and with a single fixed speed, which allows for predictability and a direct correlation between throughput and latency with link utilization. Therefore, on wired networks we use throughput and latency metrics to directly assess the health of a network.

Throughput vs Utilization.png

In wireless networks, the underlying link is half-duplex and operates with a variable link data rate depending on the combination of AP capabilities, client capabilities, and environmental factors such as RSSI, SNR, and multipath. Therefore, there is no direct correlation between throughput and link utilization. For example, take three different clients that all need to consume the same 10 Mbps throughput: a high-end laptop may result in 4% airtime utilization, a tablet 10% airtime utilization, and a smartphone 25% airtime utilization (these are example figures only and should not be used for planning purposes). Clearly throughput is not a good measure of WLAN utilization or health. The unique mix of clients and applications on a network determine the airtime utilization on the network and the resulting throughput performance and access latency of the WLAN. WLAN professionals must focus on the underlying airtime utilization in order to assess network utilization, available capacity and wireless network health.

Airtime utilization is the key metric to measure and assess WLAN health

Client Mix.png

Airtime utilization (or channel utilization) is influenced by two main factors: external RF interference (non-Wi-Fi energy) and medium contention (Wi-Fi transmissions). External RF interference is fairly straightforward; energy above the CCA ED threshold (Clear Channel Assessment, Energy Detection) causes Wi-Fi stations to sense the medium as busy and defer transmission, thus consuming available airtime from the Wi-Fi station’s perspective. Additionally, energy below the CCA ED threshold can raise the noise floor and reduce the SNR for Wi-Fi stations, resulting in the use of lower data rates and possibly higher retransmission rates. Medium contention requires a bit more definition, as the topic is more nuanced and requires greater focus to successfully plan and design a WLAN.

We can classify sources of 802.11 medium contention into two major categories. If there are two limits we need to be aware about in Wi-Fi, it’s these two!

Airtime Demand – the airtime demanded by stations within an individual AP radio cell.
Co-Channel Interference (CCI) – the airtime utilization that results from Wi-Fi contention across all stations (APs and clients) on the same frequency or channel across multiple AP radio cells.

The two result in fundamentally the same effect but are approached differently within the WLAN design process. Airtime demand is addressed through capacity planning while CCI is addressed through coverage planning.

Airtime Demand

The first major source of medium contention is the airtime demand within a single AP radio cell. Simply put, this is the amount of airtime required by all clients of varying capabilities running a variety of applications, which are connected to a single AP radio.

Many Wi-Fi professionals and novices alike fall into the trap of guessing the number of clients a single AP should be designed to support, or even worse deciding how many APs are required based on square feet / meters using a rule-of-thumb. These outdated methods for WLAN design have resulted in capacity forecasts that do not accurately reflect the capacity demand and intended use-case(s) for the WLAN. Often WLANs are deployed with too few access points by following an outdated coverage-oriented design methodology, or with too many access points because capacity planning has not been performed, the capacity planning methodology used is inaccurate, or the false notion that simply deploying more APs will result in more capacity.

WLAN capacity is heavily dictated by the interaction between the infrastructure and client devices, with the capabilities of each directly shaping the performance of a network reliant on shared airtime. No two WLANs are alike due to the unique mix of access points and the myriad of different client device types. Therefore, the measure of WLAN capacity is determining the airtime demand of all stations on the WLAN based on their quantities, capabilities, and intended use (application requirements, user and/or device behavior). From these measurements, coupled with other environmental characteristics, we can derive a capacity forecast, which describes the number of Wi-Fi radios operating on non-overlapping channels (to minimize CCI) in the same physical area that are required to meet the throughput requirements of all client devices.

Airtime Utilization Equation.png

The airtime demand placed on the WLAN by each individual client device is determined by taking the application throughput divided by a realistic device throughput capability. Care must be taken to use realistic device throughput capability figures that devices will actually experience throughout the WLAN; avoid using the peak throughput under a best-case scenario. The airtime demand is then summed for all concurrent client devices on the WLAN and distributed between frequency bands to determine the correct quantity of APs and radios to deploy. This provides a forecast of the capacity required on the WLAN.

The key to the capacity forecast is reducing medium contention between client devices by segmenting them into small enough groups operating on non-overlapping channels so that each client can achieve the required application throughput level for an optimal user experience. As depicted in the graphic below, the goal is to find the correct number of AP radios that will segment users into different collision domains rather than overloading AP radios. The breakout or ratio of 5 GHz to 2.4 GHz radios is of critical importance as well, since the 5 GHz bands offer significantly more channels and capacity.

Airtime Demand in a Cell.png

Capacity planning should result in the optimal number of APs to serve all users without overloading APs.

Co-Channel Interference (CCI)

The second major source of contention is co-channel interference (CCI). Since radio communications are unbounded, receivers must attempt to distinguish the desired incoming signal from all other energy. When multiple transmissions exist at the same time on the same frequency it complicates the ability for receiving stations to accurately determine the correct signal to sync its circuitry to and receive. Therefore, to prevent frame loss in such situations, most wireless based systems operate in either a half-duplex (example: Wi-Fi) or simplex (example: FDD) mode. For Wi-Fi, this means that stations must defer transmission if they detect an existing Wi-Fi transmission in progress on the frequency.

Co-channel Interference (CCI) results from the need to re-use the same radio frequencies (channels) within a multi-AP WLAN deployment due to the limited spectrum resources that we have to work with. It also results from neighboring WLANs that are within range of one another and using overlapping frequencies due to lack of coordination or limited spectrum resources. When CCI is present, multiple AP radios have overlapping coverage areas and cause Wi-Fi stations to defer transmissions across AP boundaries. In this manner, a transmission in one AP cell causes deferral in an adjacent AP cell. The result is that the two AP cells share available airtime and capacity to a large degree.

It is critical to understand how to design WLANs to minimize CCI. To do this effectively, we must know what the RSSI threshold is where a Wi-Fi station can properly recognize, sync, and decode a frame preamble and PLCP header (physical layer header). Unfortunately this varies between chipsets and device design. Generically, frame PLCP headers can be decoded at the Receive (Rx) Sensitivity level of the Wi-Fi device at the data rate used to encode the preamble. PLCP headers are hard-coded at low data rates based on the PHY specifications as follows:

802.11b with Long Preambles = 1 Mbps
802.11b with Short Preambles = 2 Mbps
802.11a/g/n/ac (OFDM) = 6 Mbps

The Rx Sensitivity level for many chipsets at these data rates can be very low, in the -90 to -99 dBm range (sometimes even lower for access points). This can result in CCI being detected from very distant transmissions, causing deferral and loss of airtime and capacity. Some Wi-Fi devices, mainly APs, also have artificial CCA carrier sense thresholds that can be configured to ignore transmissions below a defined signal level that is higher than the Rx Sensitivity of the device, reducing the negative effects of CCI from distant transmitters.

Savvy readers may be asking, “why don’t they just design the device with a lower Rx Sensitivity in the first place?” The answer is because better Rx Sensitivity improves the reception of frames at all data rates, including higher data rates, improving rate over range across the board. Improved Rx Sensitivity is generally a good thing and improves performance.

The IEEE 802.11-2012 standard also defines a signal threshold for CCA carrier sense and deferral, which is -82 dBm for OFDM PHYs (802.11a/g/n/ac). This level is also a common artificial threshold in APs. Therefore, WLAN professionals commonly design WLANs to minimize CCI using a cell boundary of -82 dBm. It is important to understand just how far an RF signal travels beyond the desired association coverage area (e.g. -66 dBm). The graphic below helps visualize this distance, which is the result of the inverse square law, which states that every doubling of distance in free space results in 1/4th received signal strength, or -6 dB. The practical effect is that CCI can very well cause CCA deferral, shared airtime and shared capacity up to 8x the distance from the AP as the desired client association range!

Graphic courtesy of the Aruba Networks VHD VRD Theory Guide

The lack of CCI mitigation is one of the major sources of reduced capacity on modern Wi-Fi networks. Often times network architects recognize the need for greater capacity but fail to take into account the negative effects of co-channel interference (CCI) that actually reduces capacity. It is critical to plan for the proper AP quantity, AP placement, antennas, AP channel width and use of DFS channels (affecting the number of channels available for frequency reuse), association coverage threshold, AP overlap for roaming, frequency reuse, and CCI boundaries when designing a WLAN. By installing too many APs or by inadequately implementing a frequency re-use plan, CCI will increase and actual capacity on the WLAN will decrease due to increased overhead from management and control traffic. Additionally, capacity decreases even further because more clients are drawn to connect to APs on the frequency due to higher average RSSI / SNR across a larger coverage area, which brings more stations competing for the shared airtime of the channel. As part of the design process, disabling radios is a tactic that should be considered when appropriate to minimize CCI, especially in the 2.4 GHz band and the de-facto standard of dual-radio APs with fixed frequency bands.

The Breaking Point

Since airtime utilization is the key metric that determines WLAN health, we need to know the breaking point where the amount of airtime utilization results in degraded application performance and user experience. The breaking point varies but is based on application latency requirements and the underlying mechanics of WMM contention handling for different QoS queues. Latency is heavily dependent upon the number of collisions and retransmissions it takes to successfully deliver a frame over the air.

Airtime Utilization Thresholds.png

Network utilization and retransmissions per-frame for each EDCA Access Category (Source: Comsis)

When designing a WLAN, identify which set of applications are in-use on the network and thus which airtime utilization breaking point is applicable:

Data applications are more tolerant of frame retransmissions since they do not require real-time interaction. The WMM best effort queue is where most data applications are handled, which has a large initial contention window size, accommodating more concurrent users and higher airtime utilization before retransmissions and degraded performance begin to appear.

80% airtime utilization is a good threshold to use for WLANs that only support data applications.
Real-time applications like voice and non-buffered or interactive video have more stringent latency requirements. The WMM voice and video queues handle these traffic types and have much smaller initial contention window sizes than the best effort queue, resulting in a lower airtime utilization threshold before user transmissions begin colliding resulting in retransmissions, latency spikes, and degraded application performance.

35% airtime utilization is a good threshold to use when the WLAN only supports voice and video applications, which is rare. More commonly, WLANs with voice support a mix of voice and data applications, described next.
Mixed networks support both voice and data applications. Therefore, the airtime utilization threshold to use is a blend of the best effort, voice, and video queues.

50% airtime utilization is a good threshold to use for mixed-use WLANs.

Integration Into WLAN Design

WLAN design should be approached with an emphasis on developing a balanced design, providing appropriate levels of coverage and capacity. A balanced design attempts to provide adequate capacity to meet growing demand while not over-building the WLAN and incurring excessive cost. This approach requires careful analysis of capacity requirements in order to determine the appropriate number of APs to meet current and future demand. Frequency re-use is of critical importance during RF planning in order to ensure that AP density required can be implemented successfully without causing significant co-channel interference (CCI). A balanced design is appropriate for most modern WLANs, which face increasing device density and business reliance on the WLAN, but must be mindful of budgetary constraints and return on investment.

Proper capacity planning must be coupled with RF coverage planning to determine the correct amount of APs for a WLAN as well as how it should be implemented within a given physical space using correct AP placement, antenna selection, coverage patterns, and frequency re-use. Planning for RF coverage and WLAN capacity require different methods of forecasting and measurement, while at the same time being tied together in a coherent fashion to achieve a successful outcome. Both coverage and capacity requirements should be forecasted as part of the WLAN design process and merged together to provide a final WLAN design. Network architects should not rely solely on either coverage planning or capacity planning to design WLANs, but use both processes together.

Iterative Approach to WLAN Planning.png

In some environments, capacity requirements will dictate more access points (on non-overlapping channels) than would be required based purely on RF coverage requirements. In other environments, the opposite may be true. And in high-density environments, the per-user performance may be restricted due to RF spectrum and CCI limitations. It is recommended that network architects perform WLAN capacity and coverage planning in parallel and in an iterative process, balancing the requirements of both before deciding on a final design.

Additional Resources

I recently conducted a webinar with Aruba Networks titled “Great Wi-Fi Starts with Proper Design” where I covered the 7 C’s of WLAN design, describing the key steps for success from start to finish.

I presented at WLPC 2015 on “WLAN Capacity Planning: From Concept to Practice” where I covered the iterative design process between capacity planning and coverage planning in depth. I also walk through a real-world example to bring the methodology to life in a tangible way.

I have also released a free tool, the Revolution Wi-Fi™ Capacity Planner ©, which is a predictive tool to aid in the capacity planning phase of WLAN design. The tool is also accompanied by a user guide, which details the theory and methodology used. Both are bundled together when downloaded.

Be sure to check out other videos I have posted regarding WLAN capacity planning.

Closing Thought

Wi-Fi is a complex technology and the only way to get better is to put in the effort through learning and experience. So get out there and push yourself to the limit!

(Cue adapted Scarface Theme, verse 2)…

Welcome to the limit

(The limit)

Take it maybe one step more

The bandwidth hungry clients still comin’ so

You better learn it

Push it to the limit

(The limit)

With no one left but you in your way

You might get careless, but your WLAN’s never safe

While you still maintain it

Welcome to the limit

(The limit)

Standing on the perimeter edge

Don't look down just keep your head

And you'll be finished

Cheers,

Andrew von Nagy

11 comments

3 views

Comments

bcroxall

Jan 10, 2017 05:34 PM

Andrew,

In a comment above you mention, "I have experienced poor voice quality issues when using smart devices like a laptop with a softphone or a cellular phone running a voice app... which is typically caused by a performance issue on the device itself. Also, softphones or voice apps also need to be developed correctly to integrate with the operating system QoS APIs to actually classify and queue the traffic correctly, not just tag it with WMM / DSCP."

Do you have guidelines or can you point me in the right direction on how to correctly develop an application with proper QoS? I am working on an app where the OS scheduler just doesn't send packets in a timely manner. I have placed a timer in the code after sendto(), everything looks correct there, but a capture of the outgoing packets shows excessive delays between packets actually going out. Our application is on iOS, Android, Mac, and PC but this becomes a problem on iOS and Android on WIFI. Also, will setting the proper QoS as you suggest help reduce the burstiness of packet loss that I see on WIFI networks?

Thanks.

Bruce

revolutionwifi

Jul 07, 2015 03:13 PM

Oh, I see what you mean now Colin. Thanks for pointing out the mistake, I have corrected it.

Cheers,

Andrew

Colinlow

Jul 07, 2015 03:05 PM

The diagram shows application throughput divided by device throughput, but the text reads device throughput capability divided by application throughput.

revolutionwifi

Jul 07, 2015 02:19 PM

Jay,

Getting Lync voice into the AC_VO queue should help, since it has (by default) an initial contention window size that is half the size of AC_VI. So statistically it should reduce the number of retransmissions and overall call quality issues.

If I may ask further:

- Are the large file transfers that cause issues mainly uploads (from client to AP) or downloads (from AP to client)?

- Does this occur when one client is performing a file transfer or many clients all at once?

- Have you verified what QoS queue the client(s) are tagging the file transfer traffic currently (it should be in AC_BE, if it is not then that could cause issues)?

- Are the file transfer client(s) 802.11n/ac capable and using A-MPDU?

- Do you see any substantially large TXOP or Duration values being used by the file transfer client(s) (e.g. reserving too much time on the WLAN, causing voice clients to wait excessively long to transmit/receive)? What is the TXOP value set to on the SSID for each WMM queue?

- Have you baselined the performance of the client drivers? Do they perform poorly overall?

- Do you have the ability to test a dedicated voice handset at the same time as the file transfer? What results were obtained?

- Is there a large amount of broadcast or multicast present over the air, which might increase load on the Lync client and impact performance?

- Does the Lync voice quality suffer under any other circumstances that may indicate a non-WLAN issue (e.g. client CPU or memory load, WAN utilization, strict priority and rate limiting of voice QoS traffic on the wired network, etc.)?

Contention related performance issues are typically only seen when there are multiple clients, each with their own timers and trying to access the network at the same time (slot time, to be precise). When there are only a few clients (e.g. 1 voice, 1 data) then contention issues and retransmissions should be minimal.

Overall, I'm fairly skeptical that this is a contention related issue. The more likely culprit is a device performance issue in my opinion, and is where I would focus first (ssuming that there is no other WLAN traffic or interference that you haven't described yet).

Best of luck,

Andrew

P.S. - Not sure if you've read this post on Lync QoS that I wrote a few years ago:

http://revolutionwifi.blogspot.com/2011/09/microsoft-lync-qos.html

thecompnerd

Jul 06, 2015 10:43 AM

Hey Andrew,

Thanks for taking the time to respond.

The clients are all Win7 running Lync 2013. The issue can occur when another client utilizes around 80% during file syncs. In most cases, client retries go up, which may result in poor voice quality. Some of this may be due to the fact that as you know, Windows will use AC_VI for DSCP 46; our Lync calls are being tagged bi-directionally by the client and controller+Lync SDN. I'm working on rolling out a new QoS GPO to configure Lync voice as DSCP 48 to get AC_VO over the air. I'm hoping this helps, but in the mean time, I thought I'd find out if it was worth downgrading heavy traffic as AC_BK. For instance, I can target the laptop's backup software and mark it DSCP 8 to get AC_BK. I haven't found anyone do this in practice and am thinking I will have to test this unless you have any thoughts.

Regarding the WLAN, CCI is sometimes an issue. This is a dense deployment with wide open spaces. Channel re-use is tough, but I'm looking into ways to improve this.

revolutionwifi

Jul 06, 2015 09:41 AM

Thanks Devin!

revolutionwifi

Jul 06, 2015 09:34 AM

Hi Colin,

The equation is: realistic device throughput capability divided by application throughput.

Reference the embedded picture in the blog post.

Cheers,

Andrew

revolutionwifi

Jul 06, 2015 09:33 AM

thecompnerd,

Can you clarify what issue you are experiencing? Is the large file transfer occuring on the same client as the voice issue (e.g. a laptop with a softphone), or are they different clients (e.g. a dedicated voice handset and a laptop)?

When using dedicated voice handsets, I have never had a problem with voice quality issues when the network has been designed properly. If you are having a problem with dedicated voice handsets, it's more likely that you have underlying Wi-Fi or wired LAN design and performance issues, or there is a software performance issue in the handset image. The LAN issue could be a lot of things (CCI, capacity, roaming performance, lack of proper end-to-end QoS, etc.).

I have experienced poor voice quality issues when using smart devices like a laptop with a softphone or a cellular phone running a voice app... which is typically caused by a performance issue on the device itself. Also, softphones or voice apps also need to be developed correctly to integrate with the operating system QoS APIs to actually classify and queue the traffic correctly, not just tag it with WMM / DSCP. This is a big problem as many apps don't do this properly.

Cheers,

Andrew

DevinAkin

Jul 04, 2015 05:42 PM

Sensational work Andrew. Thanks for taking the time to write this up.

Devin

Colinlow

Jul 02, 2015 11:24 AM

Andrew,

Nice article. Is it "device throughput capability by the required application throughput" or app throughput divided by capability?

cheers,

colin lowenberg

thecompnerd

Jul 01, 2015 03:09 PM

In mixed environments, I've found that call quality issues can exist when large file transfers are in process, even when the client is tagging voice with the appropriate WMM/DSCP values. I've considered writing client-side QoS policies that would downgrade bandwidth intensive apps to AC-BK over the air. The assumption is that this would free up airtime for voice, but I'm unsure if this would have the desired effect as this would affect throughput of the downgraded app. Any thoughts on this and how to better guarantee bandwidth for VoIP in those mixed environments?

Blogs