03-24-2015 09:51 AM
Fairly recently, our school district completed a replacement of our older controller-based AP-65s and 105s with all IAPs. We have 40 sites, each of which has its own subnet and, therefore, its own IAP cluster. Each site has an IAP-225 as the preferred master for the VC. Some sites are all IAP-225s, but most are a mixture of a few IAP-225s and mostly IAP-205s. The largest deployment has 142 IAPs, the second largest is 126, and the rest are all 100 or fewer.
Over the past month or so, we have started noticing 2 major issues, both with the IAP-205s:
1) After a few days of uptime, random IAP-205s throughout the district will stop handling DHCP requests properly. We saw during a Wireshark packet capture that during the DHCP handshake process, the Discover packet went out, Offer came back, Request went out, but the ACK never came back from the client. I don't know if that means the Offer made it to the AP and then never made it to the client, or if the Offer made it to the client but then the ACK never made it back to the AP. But this particular problem has become rather epidemic, resulting in having to reboot entire IAP clusters on a weekly basis. I've had a TAC case open on this for a few weeks, but we're not making much progress. When this happens, I've been noticing that the IAPs in question have very little free memory as reported by the VC (like 4 MB or less), whereas they usually have about 30 or 40 MB free.
2) This happens less often, but sometimes, the CPU utilization on random IAP-205s gets pegged at 100%. I just opened a TAC case on this as well. When it happened today, I SSH'd into the IAP in question and did a "show cpu details", and a process called "dpimgr" was consuming 98% of the CPU. I also did a "show tech-support" to send to TAC, but it took me a couple of tries because the IAP kept closing my SSH connection.
I'm starting to think the IAP-205s weren't designed well, i.e. perhaps they have too little memory and/or not enough CPU power, to be able to function well in a dense deployment. Has anyone else had any experiences like this? We've had no problems whatsoever with the IAP-225s. I'm now wishing we had gone with ALL IAP-225s, but of course, that would have cost 3 times as much.
Solved! Go to Solution.
03-24-2015 11:39 AM
The number of users can be low (less than 10) to high (50+) when the problem occurs. As for what type of traffic... what do you mean? There are usually devices connected that are still working (i.e. they already had IP addresses), so there's the typical HTTP, HTTPS, Bonjour, etc. stuff going by, if that's what you mean.
03-24-2015 12:01 PM
Aruba Airheads - Powered By community for empower the community
************ Don't Forget to Kudos + me,If i helped you******************
03-24-2015 12:02 PM
tsd - How do you define "high-density"? There have been times where an IAP-205 has had 60+ clients on it. A lot of the time, though, a large percentage of those clients are student devices (Smartphones, iPods, etc.) that are connected to our Guest network but aren't actually doing anything. Aruba's info on their web site (http://www.arubanetworks.com/products/networking/access-points/200-series/) lists the 200-series as being ideal for "Medium density" WiFi environments, whatever that may mean.
cjoseph - PM coming your way.
03-24-2015 12:11 PM
Are u using IDS? protection ? detection? are u using APPRF? try disable thoese services and see if it's effect the cpuload - also , be sure to use the lastest instantOS
We are not using IDS at all. We do have firewall rules defined for our Guest network. At the risk of sounding stupid, I don't know if we're using AppRF. (Where would I go to check?) I don't recall specifically turning it on, so unless it's on by default, we're probably not using it.
We were running version 184.108.40.206-220.127.116.11 when we first started having the DHCP issue. I backed down to 18.104.22.168-22.214.171.124, thinking that might fix it, but it didn't. I then backed down again, to 126.96.36.199-188.8.131.52, which is the earliest code that is supported by the 205. That didn't help, either. Right around when I did that last downgrade, I saw that 184.108.40.206-220.127.116.11 was pulled from the download site. I see that 18.104.22.168-22.214.171.124 is out now, and it does make a mention of high CPU utilization under certain circumstances:
Bug ID Description
113152 Symptom: A higher CPU utilization was observed in IAPs. This issue is resolved by restricting the
Wi-Fi drivers from sending messages pertaining to client roaming when fast roaming is not enabled
on an SSID.
Scenario: This issue occurred because the Wi-Fi drivers sent incorrect message requests to the
STM process, which in turn sent incorrect PAPI messages. The issue was observed in IAPs running
Instant 126.96.36.199-4.1.x.x releases.
I haven't tried updating to 188.8.131.52-184.108.40.206 yet, but I'm willing to give it a shot if it will potentially fix one or both of these problems.
03-24-2015 12:13 PM
More and more devices are starting to have mechanisms that stop this but most still don't.
03-24-2015 12:18 PM
If you have a large number of devices idling on your guest network, you could be hammering the captive portal with requests as applications try to get out to the Internet.
There was a reference to captive portal hits in the release notes (from a previous version), but we're not using the Captive Portal. We have a web filter that requires authentication before clients can get out, so many times, apps on a device won't work because of that, but that shouldn't affect the APs, since we're not using the actual Aruba Captive Portal feature. In other words, clients can connect to our Guests network without having to authenticate or go through a Captive Portal, but they still have to then log in to the web filter before they can get out to the internet. (They have to log in to the web filter regardless of whether they're using our Guest or Secure network.)