on 08-28-2013 10:03 AM
Hello all, we implemented new centralized 7210 controllers in late July and since the return of our residents this past week we are having issues where our APs (125 and 135s) in number (20-30) are loosing contact with our controllers randomly. Investigation with Aruba TAC seems to show high datapath CPU utilization on the controllers when the issue crops up, and appears so far to look like it may be a code issue. We are running 18.104.22.168 on both 7210 and 3600 model controllers, the 3600 was in service all last year on 6.1.x code without this high datapath CPU occuring but now the datapath cpu is spiking on it as well as the 7210s. Port utilizations look great (around 10% on Gb links) and no errors to speak of on the ports. Just hopeful we are not alone with this issue.
Thanks in advance for any feedback
on 08-28-2013 10:26 AM
We are currently running 22.214.171.124 with three 7240's.
We been experiencing high CPU utilization on the AP125's (when running the show ap debug system-status ap-name <ap-name> even if those have no clients but I haven't seen the same behavior with AP105's or 135's
I currently have a case open.
Would like to know what you guys find out with TAC.
Lead Mobility Engineer @ Integration Partners
AMFX | ACMX | ACDX | ACCX | CWAP | CWDP | CWNA
on 09-05-2013 05:36 AM
Yes please post any resolution of word from TAC. We are running 126.96.36.199 and are experienceing many AP reboots per day. The reason for reboot is given as out of memory. After reviewing our case the TAC recommended upgrading to 188.8.131.52 which resolves to known memory exhaustion issue.
Hoping not to trade one issue for another..
on 09-05-2013 06:36 AM
We have some feedback from TAC: the CPUs for datapath (show datapath utilization on controllers) was showing at least one CPU spiking to 100% when our APs were rebooting or bootstrapping and bouncing between master and backup LMS controllers. We had enabled Broadcast/Multicast (BCMC) Optimization on the SSID profiles, but apparently there is also the option to turn on BCMC Optimization on the VLAN interface as well (thank you Princeton for getting this feature I understand). This solved our issue, however, as we are a heavy iOS (iPad) environment the consequence of enabling BCMC on the VLAN interface is now the AirPrint and other AirGroup traffic will not work (we use PaperCut on a MAC server to allow our iPad users to print to campus printers). So we are between the proverbial Rock and a Hard Place until we can solve the BCMC issue on the VLAN the controllers and APs are located.
In three years with Aruba controllers/APs we have never had to use the BCMC Optimization on the VLAN interface, so we have either reached some threshold of BC traffic or we were concerned that we went from 6.1.3.x code to the 6.2 code may have caused the traffic we had to now be an issue for the controllers. So for you folks with APs dropping check periodically the datapath utilization, check the ap bss-table total times for APs to make sure they match closely the UP time of the AP (looking for APs bootstrapping), as well as the system log for any instances of heartbeat misses.
Hope this is helpful to someone, I'll be staring at packet traces for awhile to see if we can fully identify and resolve this errant BC/MC traffic
on 09-05-2013 07:00 AM
For us, we have 2 local controllers, one for dorms (7210) and one for remainder of campus (3600 curently, another 7210 will be deployed soon but we are leaving it in place for now just to see if we had a specific issue with the newer controller model). Dorm controller has about 84 APs (mostly 125s), the other has 104 APs (mix of 125 and 135s). We would have about 50% of the APs on either controller bounce to their backup (our 1 master controller), and at the time of the issue (datapath utilization at 100%), and the ping response would either get very long (1.5-2.5 seconds) or not repsond at all to the controllers (but they would ping our gateways just fine, under 2 ms). Typically we would see the datapath utililzation spike for about 3-5 minutes and then return to normal for an hour or more, then another episode would come again. Also, we had almost no case where both local controllers were experiencing the problem at the same time, which led us to believe it might be user traffic vs the LAN traffic on the VLAN interface, but we are told the BC/MC Optimization on the VLAN Interface on the controllers is operating on the inbound traffic from the LAN into the controllers.
on 09-05-2013 07:03 AM
Ryan Holland, ACDX #1 ACMX #1
The Ohio State University
09-05-2013 07:53 AM - edited 09-05-2013 09:17 AM
on 09-05-2013 07:59 AM
We upgrade to code 184.108.40.206 and starting have 30 - 40 AP-125s reboot every day, so we upgrade to 220.127.116.11 and that resolved the problem of the APs rebooting. Now we have users complaining about slow wireless performance, which we have never really had before. I'm not sure if it's a code problem, but from 3.x to 6.1 we have had almost no complaints of wireless performance problems.
on 09-05-2013 08:54 AM
I guess that is the $64k question (what BC/MC traffic is it that is causing the issue), and what we are trying to determine as quickly as we can. I am hoping we are not the only ones seeing this issue with these controllers/code version, our AP and user population is reltaively small with under 200 APs and about 3k total clients, so with much larger organizations out there using Aruba gear I was hoping not to be the only one experiencing this. Off to do more sniffing (looks like I picked the wrong week to stop sniffing glue).