I've seen this discussed a lot both here and on other wireless forums (Ruckus, etc.), but I haven't been able to find any good info on pinpointing the issue or what the solution(s) might be. I know every environment is different - but is there a good way to troubleshoot this issue or are there any gotcha's I should be aware of with Macbook's as wireless clients?
The newer MacBook Pros in my environment (2015 models are only affected it seems) lose all connectivity over wireless in areas with very good conditions: coverage, low channel utilization, low noise floor, etc. This happens when roaming, it seems - although the users seem to be sitting in meetings or at their desk working often when this is happening (could be client match or something else causing the roam). I notice that the MacBooks are sticky and don't like to roam until the connection is almost unusable when I try to recreate the problem.
In most cases, I notice that users at their desks who have the issue and call IT for support are still disconnected when the tech reaches them. To get them reconnected, they've been having to forget the network and set it back up. For users who have issues in meetings, they will often walk their laptops back to the service desk, which is near me, and I always hear them complain that they were disconnected from the wireless network... then they look at their laptops and say "OH WAIT! I'm connected again." Then they go back to their meeting and don't seem to have issues again until it comes up again - which is not 100% consistent for each user, but enough that we receive the same type of complaint almost daily from at least one user.
I'm currently testing on a MacBook Pro 2015 model with OSX Yosemite 10.10.3. I will ping an address on the network and walk around the building. I seem to drop a packet when I roam between AP's 80% of the time (the dropped packets are correlated with the logs on the controller). The other 20% of the time, I will lose the connection for 10-30 seconds.
On the AP, I've compared auth-tracebuf, user-debug, and ap client-trail for a roaming event that causes 1 packet to drop vs one that causes 30 seconds of loss of connectivity and they look identical for the most part. station down to station up + all the events to auth success / association with the AP (including every line in the debug for that event) happens within the same second in both scenarios.
I ran the wireless diagnostics tool on the Macbook though, and it alerted me to the loss in connectivity. Unfortunately, the results are: "Review Wi-Fi Best Practices" and make sure you have your wireless router's channel set to auto. Useless.
For the controller, I'm running a 3400 HA pair on 126.96.36.199. Auth is TLS + cert cn check via ldap. No inner eap (all unchecked).
If someone knows what might be going on, please do tell.
If not, please give me an idea of what to do next to troubleshoot. I know this is most likely a client side issue since everything else works just fine on the wireless network - but I figure someone else has to have run into the same issue and resolved it!
PS - 85% of the Aruba support cases I open end up with me running around in circles for weeks gathering data, making changes, etc. and then finding the answer myself somewhere on these forums - so that's not the advice I want to hear.
I just did some additional diagnostics I the Mac added:
incorrect country code. Details show that when it tried to roam:
802.11d country code set to 'X0'
Then one second later, same message, but US.
I'm in the US.
Could this cause an issue?
The incorrect country code is coming from the Mac's wireless diagnostics. In the log, I see a lot of country code changed to X0 + country code changed to US. When I ran the diagnostics that gave the results, I ran a scan and saw a wireless pritner on the network that was broadcasting 802.11d with a country code XD. X0 doesn't seem to be an issue though based on some of the stuff I've read.
I'm thinking I have two separate issues then. One must be roaming (the one I can recreate easily). I believe the clients see the connection drop when they show up to a meeting, then immediately run over to the service desk, at which point, their connection is back by the time they arrive (less than 1 minute). This is likely what I'm recreating, and it sounds like it might be an AP power issue:
If my max power is set too high, is there any risk in dropping the power? I recently inherited this implementation, and there are a lot of walls and no site-survey or even Aruba RF-Plan virtual site survey was done prior to implementation. Could I potentially cause coverage issues by lowering max power? I thought that ARM took care of power to create the most efficient use of the environment?
I've now heard that the second issue typically happens after boot up and is fixed by either a reboot or forgetting the network and rebuilding the profile (cert auth, etc.). I will likely need to run more client side diagnostics and try to catch this issue when it happens.
Here is what should be addressed:
- The power is too high. ARM by default is min 18, max 127 which is too high. That should probably start at Min 12, max 18.
- You are using 80 mhz channels. You should uncheck "80 mhz channels" in the ARM profile, because you do not have enough channels to deploy it (nobody who is NOT using DFS has enough channels).
Channel 149E, 161E, 157E, 153E all use the same 4 channels. The only difference is that the management traffic is sent only on the channel with the number. You effectively only have 2 channels deployed in the 5ghz band, as a result. Uncheck "80MHz support" in the profile and you will have 4 channels (40 mhz wide channels):
If you set the ARM setting "Allowed 40 mhz channels" to "None", you will have 9 non-overlapping 5ghz channels (20 mhz wide channels). That will significantly decrease your contention:
- Make sure "drop broadcast and multicast" is enabled on your Virtual APs so that you can reduce contention.
My discussion on this is here: http://community.arubanetworks.com/t5/Technology-Blog/Removing-the-Bottleneck-in-Wireless/ba-p/77978
It is possible that even though you do not have high utilization, you still have contention, which will cause disconnects. High power will cause roaming even while your clients are stationary, which is also a problem.
Unless you have super-sparse coverage, I don't think there will be any harm reducing the power...
Wow! Thank you so much for this analysis. It sounds like it should resolve a lot of the user complaints I see come through. I do have some questions though:
1. Should I disable 80Mhz first, then if there are still issues, set 40Mhz supported to none? Or would it be best practice to set it to 20Mhz from the start since I'm not using DFS channels and 4 40Mhz channels may not be enough. Or should I enable DFS channels?
2. I read your article that you linked to (very informative!). You mentioned turning on hybrid mode in AP's when spectrum analysis is needed. The installtion I'm managing has it turned on permanently. Should this be disabled when not actively needed, or is there some additional benefit to what ARM does (or something else) that would justify leaving it on.
3. At this point, I plan to disable 80Mhz channels, modify min/max EIRP power, and drop broadcast + unknown multicast. Do any of these changes require an AP reboot, or will they take effect immediately? Will they drop clients or cause connection issues?
Thanks so much for all your help every time I post a question!
1. 20 is best. If it works, you can move to 40. That is the lowest common denominator, before considering DFS or 40mhz channels.
2. Hybrid spectrum should be off unless really needed.
3. None require a reboot of controller and AP and clients should not be dropped.
Awesome! I will get this all implemented this week.
I've implemented the power + channel-width changes and disabled spectrum monitoring on my APs. Everything seems to be good so far.
When I went to enable "drop broadcast and unknown multicast", I get an error saying I need to enable "no broadcast-filter arp".
Can this only be done on the CLI, and does it have to be done for each VAP I drop broadcast and unknown multicast for?
That message is informational, because the broadcast filter arp option is enabled by default.
I have no broadcast-filter arp configured on a couple of my vaps by the previous admin. Any idea what the purpose of that would be?
On the commandline, it is broadcast-filter arp. In the GUI it is "Convert Broadcast ARP requests to unicast". It should be enabled by default. It is proxy arp, and there are few situations where you would not have it checked. In this situation (when you are dropping broadcast and unknown multicast), you need it checked.
Okay - so both options should be enabled?
I will make the change this evening.
Hi. We are a K-12 school here in Dubai and most our users are using 95% Apple Computers, running Mavericks or Yosemite (About 1500 Mac's on the Wireless network). We too have seen similar issues. After seeing this thread and reading your article, Colin, we did have our AP's power set too high, but brought it down to min 12 dBm and max 18dBM. We do have an AP-105, in each classroom as the walls in our classrooms are concrete so signal sometimes have trouble penetrating the wall. What we are seeing is that when a student or teacher moves from class to different class to their different times students have had a hard time connecting to wifi. However after lowering the min power to 12 and max to 18 has helped. However now we have had another issue where Mac's lose connectivity after connecting to wifi after moving from a different location.
Users workflow is:
We do have Client Match on, and have OKC disabled, so it uses PMKID. Do you know what would cause this issue? Do you think it would be best to lower the ARM power even more considering we have an AP in every classroom? I think what's happening is that when a student/teacher closes their Mac to sleep, Aruba thinks the machine is still associated to the wireless network. It doesn't know that it has disconnected.
Looking forward to your reply. :)
What version of ArubaOS are you using?
Do you have encryption enabled?
We are using Aruba OS 188.8.131.52 on 7200 Controllers.
Where would we find the encryption settings? If you are talking about authentication type, we are using 802.1x along with ClearPass, PEAP MSCHAPv2
How many access points do you have and what kind?
On the commandline, what is the output of "show ap radio-summary"? That will tell us if your utilization is too high and your power needs to be reduced..
We've got about 270 Access Points. Majority of them being 105's. But also have 225's and a few 275's. I've attached the output of that command into a text file.
Okay. The access points that have a channel of + or - means you are running 40 mhz width channels. The access points that have "E" next to the channel means you are running 80 mhz wide channels (running 80mhz channels many times is not good in high density environments). You need to run the same channel width on all your access points or that could cause some of the issues you mention. I would start by running just 20mhz channels to make it even. Here is what you do in the ARM profile to make that happen:
When you run that command again, after the change, you should not see a +, - or an E after any channels...
Ok. Thanks Colin! We'll take a look at these settings.
Do we do this on both 802.11a AND 802.11g radio ARM Profiles? Changing the Allowed bands to "None" and unchecking the "80Mhz support" Here are screenshots of what ours looks like now:
On the "A" only.
Thanks for clarifying, Colin. Majority of our 105 AP's are in classrooms. The 225/275 AP's we have are in high density environments (theater, gym,outside sports field) and we are running different ARM profiles for these AP's. Should we apply these changes here as well?
If your issue is fixed by making the changes in the previous area, I would say yes, because your environment should be running uniform channel widths.
The changes seemed to help. Thanks for that. However today we had a couple classes get booted off the Wifi while sitting in class. Do you know what would cause this type of issue? Should we decrease our power even more considering we have an AP in every classroom?
If it was an entire class, you should open a TAC case to see if an access point rebooted or there was a system issue that caused that problem.
We are still seeing some users getting kicked off of the access point. This time I have attached one of the "tech support diagnostics" for the AP. Could you take a look and see if you see anything out of the ordinary? Our last complaint from one of our teachers has said that sometimes anyone who changed their position in the room got kicked off the wifi. She has had to to turn wifi off and on a few times already.
I am wondering if we should decrease the AP power more as w do have an AP in every room?
What is the output of "show ap active"?
I've attached the file with the command you specified.
It is not obvious what problem you are having from that output. Have you tried opening a TAC case? They would be able to look at all of your logs and understand what your problem could be.
At Aruba, we believe that the most dynamic customer experiences happen at the Edge. Our mission is to deliver innovative solutions that harness data at the Edge to drive powerful business outcomes.
© Copyright 2020 Hewlett Packard Enterprise Development LPAll Rights Reserved.