Handover mechanics for Wi-Fi phones
Back to the future with this Airheads Online article from 2007.
While some observers propose that voice is ‘just an application’ on an IP network, a set of VoIP protocols, those of us who spend our time with Voice-over-IP-over-Wi-Fi remain assured that the subject is still as complex, difficult and steeped in black magic as hitherto in the analog, wired world. Voice may indeed be an application, but it is much more than that: an application that stresses the system at every point, whether in battery life, quality of service, or the subject of this note, inter-AP handover.
Handover is a very simple concept: a phone is on-call (the problem is uninteresting if there is no call) while associated to a particular AP. Either because of movement or RF signal degradation, the current AP becomes a poor choice, so the phone moves its association to a new AP from which it receives a better signal. The key performance metric is also clear: there is a gap in the audio signal from the time the phone severs its association with the first AP until it re-establishes the media stream on the new AP. Most speech experts agree that a gap (sometimes called ‘latency’) of less than 50 msec is not perceptible or annoying to the listener, while a figure of more than 150 msec will usually cause complaints. At Aruba we have already surpassed the 50 msec figure, which is routinely attained in our Wi-Fi networks, but we continue to improve techniques and decrease the average and worst-case handover latency.
Handover is a particular concern for voice clients because of the time-sensitive nature of the media stream. This is much more interesting than handover for a data client: how many users walk around carrying PCs with interruption-sensitive data applications running? So voice makes handover latency visible and critical on a completely new level.
The handover process
The handover process can be broken down into a number of phases. In this note we will look at each phase, explain the important factors, the state-of-the-art and emerging techniques to improve handover latency.
Phones generate and receive voice frames every 20 or 30 msec, so any interruption of that order will cause frames to be lost. Fortunately a number of techniques are available in codec technology that can cover up or hide the loss of a small number of frames: some implementations inject white or pink noise at the same power level as the surrounding speech, while others repeat the last received frame, or use a prediction algorithm to generate synthetic frames. Adaptive jitter buffers can also be used to ‘stretch’ already-received and buffered speech if no new frames are arriving. These techniques all help to make 50msec interruptions imperceptible on modern phones.
One thread running through handover mechanics is that in Wi-Fi it is the client device that makes nearly all the decisions. This contrasts somewhat with the cellular world, where the same type of problem is addressed by split control where the network infrastructure and the client share information and can each make decisions. In fact, with the advent of centralized WLAN architectures some of the techniques from cellular technology are being adapted for Wi-Fi, but the prevailing model is that the client makes decisions, aided by information from the network. Also note that while IEEE 802.11 standards cover the format and use of frames exchanged over the air, many of the algorithms critical to handover performance are not specified and are up to individual designers, usually phone designers rather than the WLAN infrastructure. While WLANs can improve the handover latency for any phone, they cannot turn a poor implementation into a good one. Aruba has tested many Wi-Fi phones and is developing key benchmark tests to establish performance measures.
The following steps are necessary to accomplish inter-AP handover:
- Develop a list of candidate APs to which to handover when the time comes.
- Establish Call Admission Control status at the new AP.
- Decide it’s time to handover.
- Handover to the new AP:
- Authenticate with the new AP.
- Derive session keys at the new AP.
- If necessary, obtain a new IP address.
- Re-establish the media stream.
Develop a list of handover candidates
This is a stress-free process, as it occurs while the phone is still merrily driving the voice connection on the original AP. But it is important because there is not time to lose once the decision to handover has been made. The intent is to establish a list, sometimes an ordered list, of known AP candidates for handover. Further, these candidates may be on the same RF channel as the current AP, but are more likely to be on a different channel.
It is worth considering the state-of-the-art around five years ago. The Wi-Fi chipsets then available implemented very simplistic handover algorithms. Most of them waited until the signal from the current AP was lost, then initiated ‘scanning mode’. Scanning consisted of repeated probe requests on each Wi-Fi channel, followed by a wait for response from a neighbouring AP before moving to the next channel. As it can take an average of 15 msec for an AP to respond to a probe request, the time taken to run through the 11 channels available for US 802.11b was of the order of 15x10 = 150 msec. In fact, published results from this era show handover times of 150-300 msec as a matter of course, with ‘scanning’ accounting for 90% of this time. Clearly, this was not an optimal implementation, but it shows that developing and maintaining a list of handover candidates can take significant time.
There are several recognized techniques now used to improve this behaviour:
- Develop the list in the background before losing contact with the original AP. Most modern phones will steal time-slices and scan other channels between voice frames: since these are synchronous, as soon as the phone has received and transmitted a frame it knows there is a 20 or 30 msec interval before it needs to be back at the original AP for the next frame. This allows it to scan all other channels in the background, over a period of seconds. This behaviour is assumed today, but there will still be worst-case decisions where a handover is forced by rapidly failing RF signal strength, and there is no cached handover candidate list.
- Smarter channel scan order. Early 802.11b phones just scanned from channel 1 to channel 11, but developers soon found that since 2 or 4-channel plans are the norm, it was more important to scan likely channels first – channel 2 is not often used from the phone’s viewpoint, and the scan can be cut short as soon as the first eligible handover candidate is identified.
- Use of active or passive scanning. An active client sends probe requests on other (non-associated) channels, waiting for a response from any AP on that channel and noting the received signal strength (remember that in an 802.11b/g deployment, there may be several audible APs on any one channel). As noted above, there can be a significant delay associated with probe responses, as APs on the target channel may be otherwise busy. Passive scanning relies on ‘listening’ on other channels for beacons from APs. Again, this takes time, as beacons are usually 100 msec apart.
Work to improve this aspect of handover is still under way in the IEEE, notably:
- Pilot frames (802.11k draft). These make beacons more frequent. The concept is for APs to send short beacons at short intervals, reducing passive scanning time. There is some debate on the practical utility of pilot beacons, and Aruba is not yet convinced of their power.
- The neighbour report (802.11k draft). This is a favourite of Aruba. Each AP publishes in the beacon or on-request a body of information about neighbouring APs. Various useful items are included, some discussed later, but of immediate interest, the report can offer a list of neighbouring APs, their RF channels and (optionally) their beacon offset in msec. This gives the phone, as a client device, an instant shortlist of handover candidates without the need to change channel and scan or probe.
Developing a candidate list for handover is further complicated by the need to extend battery life. Every millisecond the phone’s radio is switched on counts against this, and transmitting drains the battery much faster than receiving. (Indeed, power saving or ‘sleep’ schemes such as U-APSD require some modification to the handover algorithms described here.) This tradeoff is important for good phone operation, and Aruba believes a solution incorporating network assistance to the phone will offer a superior combination of handover latency and battery life.
Thus the first step in the handover process is the most complex and critical. As discussed above, when the phone arrives at the decision to handover, the existence of an accurate, ordered candidate list makes a difference of hundreds of milliseconds to the overall handover latency.
Establish Call Admissions Control Status
A final consideration in developing a candidate list for handover is the ability of the new AP to support the phone’s traffic. This touches on Call Admissions Control, a way of restricting the number of voice calls on a particular AP before the capacity limit is reached. If the new AP is already at capacity, it will not be able to accept the voice call on handover.
While CAC will be enforced by the new AP towards the end of the handover, latency would be adversely affected were the phone to discover late in the proves that the new AP cannot carry its load. Thus it is important to indicate to the phone whether it is likely to be admitted by the new AP’s CAC, so it can eliminate overloaded APs from its candidate list.
Solutions to this problem include having APs publish their current load in the beacon, allowing the phone to make a decision, and an alternative method using the probe response (invented by Aruba). Aruba’s CAC implementation offers another way to avoid the problem: a reserve of capacity is allocated on every AP explicitly to allow handover of voice calls in progress to that AP.
CAC is even more complicated in a WMM environment, as capacity must be considered for each of the four traffic priorities. A solution for this is provided in the QBSS load element specified in the beacon for IEEE 802.11e, and in Aruba’s probe response: these give clients a view of current loads on the AP, providing an indication of whether the AP is likely to reject a handover attempt due to Call Admissions Control.
Decide it’s time to handover
In the 802.11model, it’s up to the client to decide when to move to a new AP. This is a surprisingly difficult decision: the primary indicators for the client are the signal strength and signal-to-noise ratio (SNR) of the current AP, but RF signal levels fluctuate, particularly in the case of a moving phone where it can be carried behind thick walls, metal objects such as bookcases, or even a human head as the caller turns around.
The difficulty for the phone is in distinguishing whether a sudden reduction in the signal power is a temporary event that will soon be reversed, or a sign that the phone is moving out of the coverage area of the current AP. In the former case, the correct behaviour would be to stay on the current AP, but in the latter such ‘sticky’ behaviour would result in longer handover latency, as the original AP’s signal could be lost completely before handover to a new one was complete.
Another factor is the data rate of the connection: in Aruba networks we normally set the minimum allowed data rate above the minimum possible in order to force a handover before the signal is impossibly weak: in an Enterprise Wi-Fi network there should be good, high-quality coverage throughout the building so there will always be a suitable AP to handover to.
Many phones today use a combination of factors to make the handover decision, based on the signal strength and SNR of the current and prospective APs but with various extensions and degrees of sophistication. They must also avoid rapid ‘flapping’ where clients shuttle rapidly between APs. While these algorithms are mostly the domain of the phone designer, Aruba has developed expertise in this area, and we continue to invent ways in which the WLAN infrastructure can provide the phone with relevant information to make a better decision.
Other work in the IEEE includes are submission to 802.11v for ‘directed handover’ which would allow the network to explicitly command a phone to handover to a designated new AP; Aruba supports this proposal.
Handover to the new AP
When a phone makes a decision to handover, the situation may already be grim. Phone designers have learned through experience that the penalty for moving prematurely is greater than that for making a decision late: therefore, if RF coverage is not uniform and good, the voice quality on the connection may already have deteriorated when the decision is made to handover.
Using the cached candidate list, the phone will choose what it considers the most likely candidate, and start the handover process. Since phones can only maintain association with one AP at a time, this is when the voice connection is interrupted, and the handover latency timer starts.
The complexity and latency associated with authentication depends on the protocol used. Phones have historically lagged PC clients in their support of newer authentication regimes, but for this paper we will assume that technology has moved beyond WEP, and most phones are now (1st quarter of 2007) capable of 802.11i authentication, WPA or WPA2 using at least pre-shared keys. Successful authentication allows the phone to establish Pairwise Master Keys (PMKs) with the infrastructure.
The simplest (and worst, delay-wise) model assumes full re-authentication. In this case, an 802.1x authentication must begin from scratch, involving the exchange of many frames between the client, the authenticator (mobility controller in Aruba’s architecture) and the RADIUS server. This can take considerable time – several hundred msec was not unknown in early implementations – and can be further increased by WAN latency when exchanging frames with a RADIUS server at a remote site.
Even with full re-authentication, centralized architectures and centralized encryption (Aruba is the only vendor to use centralized encryption, but many use centralized traffic and control architectures) bring this time down to less than 100 msec in most cases. Recent features such as EAP offload can improve these numbers further.
Another improvement is to implement a part of the 802.11i/WPA2 specification: Opportunistic Key Caching. OKC offers a way for the phone to pre-associated with an AP before it needs to handover, establishing a PMK that can be cached for a period of time, usually some hours. Now, when the phone wishes to handover it can bypass the whole 802.1x re-authentication sequence by presenting its cached PMK. The authenticator will recognize the PMK and move straight to the next step, deriving session keys.
The ‘neighbour report’ defined in 802.11k draft can assist the phone again here. It includes a field called ‘key scope’ that identifies the switch (mobility controller in Aruba’s architecture) managing each AP. The phone needs to pre-authenticate only once per switch, greatly simplifying its work and reducing the load on the infrastructure.
Although OKC is defined only for WPA2, one of Aruba’s customers acquired a phone that did not support WPA2, but wished to see the benefits of lower handover latency by applying the OKC technique to the WPA protocol. Aruba complied, and we now support a (non-standard) version of OKC over WPA.
Derive session keys for the new AP
Once PMKs have been established, the phone and the infrastructure must exchange frames in order to derive session keys for unicast and multicast traffic. In WPA2 this is called the ‘four-way handshake’. This exchange is relatively short compared to 802.1x authentication, so it is not usually a significant factor in handover latency in centralized WLAN architectures.
However, the IEEE 802.11 standards group is working to improve performance everywhere and the 802.11r draft addresses this area. While centralized architectures will not benefit significantly, 802.11r will be an important improvement for ‘fat AP’ networks. 802.11r also deals explicitly with CAC and 802.11e (WMM) as part of handover, allowing the phone to present its existing traffic specification (TSpec) as part of the protocol.
After session keys have been exchanged, the phone is ready to communicate with the new AP: the L2 part of handover is complete.
Obtain a new IP address
The next stage would be for the phone to obtain a new IP address when the handover involves a move to another IP subnet (L3 roaming from the phone’s viewpoint). But there is a simple rule of network design: don’t do this. It takes a long time for a client to acquire a new IP address through DHCP, so Aruba and others go to great lengths to ensure it is not necessary. The following is a brief explanation of Aruba’s solutions that enable handover while avoiding the need for a new IP address for the client.
For the simplest handover case, the network designer builds a single IP subnet covering all APs where a phone may wish to handover (this is usually confined to an area of contiguous coverage such as a building or campus). Aruba terms this a ‘mobility domain’. In this case, once the phone has got its original IP address there is no reason to change it. Aruba adds an elegant twist to this, since all traffic is tunneled from the client through the AP to the mobility controller, so IP addresses can be maintained by the mobility controller rather than depending on the Ethernet connection to the AP.
There are well-known problems as subnets expand to hundreds of clients, so vendors (including Aruba) have developed features such as filtering multicast traffic and adding ARP proxy functions. Eventually, however, a layer 3 solution is needed. Centralized architectures with WLAN switches all use some form of the ‘mobile IP’ protocol for this, where incoming traffic is directed to a ‘home agent’ with the original IP address of the client. The HA redirects traffic to the ‘foreign agent’ which then forwards it to the client. All such architectures today use a ‘proxy MIP’ feature because phones do not support MIP clients.
Note that there is a penalty for setting up the HA-FA connection as a client hands over across a mobility controller L3 boundary: this can be of the order of 25 msec in an Aruba network.
Re-establish the media stream
While the WLAN infrastructure’s work is essentially complete at this point, the phone must still re-establish the media stream. In the transmit direction, this involves resuming the transmission of frames over the air. The codec has continued to generate frames independently of the handover event, so it may be helpful to cull the buffer in order to keep the delay on the media stream low.
In the receive direction, the incoming frames must be directed by the mobility controller to the new AP, and the phone usually holds the first 20-60 msec of the stream to establish its dejitter buffer.
Handover latency is a key performance measurement in Wi-Fi networks. This paper identified the various stages of a successful handover, and discussed the techniques used at each stage to minimize handover time.
It is clear that there are many aspects of handover where the Wi-Fi infrastructure can assist in reducing latency, but the particular client implementation is still a significant factor.