I posted this same message under another discussion but meant to place it here. This is the message:
We are experiencing high CPU spikes and connectivity issues, similar to what has been discussed in this thread. This started on the first day of school, despite no major network changes except upgrading our controllers to 8.10.0.12 at the beginning of the summer. The issue became apparent with 42,000 clients on the controllers, especially during class changes when about 90% of our devices are Apple products. Enabling 802.11R seemed to help, but the issue persists during class changes, affecting clients campus-wide due to over-prescribed controllers.
Initially, we suspected COPP on our Cisco 9K switches, but it turned out not to be the cause, as COPP was policing traffic when the connectivity issues began. Our network has come to a complete standstill twice during the day. We've been working with TAC for several days now. We are using MM/MC with nine 7240 XM controllers with memory upgrades and have 6,000 APs (315, 503H, 535, 515, 375, 275, 277, 377, 325, and 215). We are in the process of replacing the 2xx series, with about 40 left on the network.
TAC is following the same troubleshooting steps mentioned in this thread. While CPU spikes are concerning, the primary issue is clients being unable to connect to the wireless network, severely disrupting classes. The ACE team might be called in, but we won't know until after class changes this morning. We've always seen CPU spikes in processes, but the CPU as a whole doesn't get overwhelmed.
Original Message:
Sent: Sep 02, 2021 04:01 AM
From: Herman Robers
Subject: After students come back; STM process now starved for CPU and associations failing at a very high percentage rate
It seems that Aruba has found something. For customers who can log in to the Aruba Support Portal: Aruba Support Advisory. It's classified for Customers and Partners only, so can't paste the contents here. Your partner or Aruba Support can give you access to it, if needed.
------------------------------
Herman Robers
------------------------
If you have urgent issues, always contact your Aruba partner, distributor, or Aruba TAC Support. Check https://www.arubanetworks.com/support-services/contact-support/ for how to contact Aruba TAC. Any opinions expressed here are solely my own and not necessarily that of Hewlett Packard Enterprise or Aruba Networks.
In case your problem is solved, please invest the time to post a follow-up with the information on how you solved it. Others can benefit from that.
Original Message:
Sent: Sep 01, 2021 04:50 PM
From: Fred Jordan
Subject: After students come back; STM process now starved for CPU and associations failing at a very high percentage rate
Reducing syslog logging levels and adjusting SNMP polling intervals and there were some broadcast/multicast settings they changed as well.
I'm not 100% sure about exactly how OpenFlow was disabled; I just know that it was.
And you guys know there is a whole thread that talks about some of this on the educause WIRELESS-LAN listserv below.
Original Message:
Sent: 8/31/2021 12:30:00 PM
From: KevinAru
Subject: RE: After students come back; STM process now starved for CPU and associations failing at a very high percentage rate
Thanks for sharing Fred. We ran into the STM CPU spike as well on the first week of school. The users can't connect to the SSID, longer time to auth, connection drop, and the spinning wheel...etc. The case is opened with TAC and so far they're looking into it, only recommendation at this point is to remove Airwave pulling on the Controllers. Aruba EE didn't ask to disable OpenFlow in my case. Our Cpu still spike in 2MDs out of 3. and STM drop showed a little on #1 and #2 MD's controller.
Just curious, what else did they change in your environment other than disabled the OpenFlow? and the OpenFlow is at the VAP Profile right?
Our Env : 2600 APs, 535's, 375's, 365's, 335's, 334's, 315's, 275's, 224's ,225's. 2xMMs, 3xMDs main cluster, 2xMD another cluster (no APs on this one). Code is 8.6.0.9
Kevin
------------------------------
KEVIN DIEP
Original Message:
Sent: Aug 31, 2021 10:50 AM
From: Fred Jordan
Subject: After students come back; STM process now starved for CPU and associations failing at a very high percentage rate
3400 APs; mostly 325's, 105's, 225's345,5, 275's in that order and a few others; ~22000 max concurrent users slightly less is good average; 2x MM's 3xMDs 8.6.0.9.
And our problem is resolved, but we do not prefer having to turn off openflow; we hear it is needed for Air Group, and maybe other advantages.
Original Message:
Sent: 8/31/2021 9:45:00 AM
From: censania
Subject: RE: After students come back; STM process now starved for CPU and associations failing at a very high percentage rate
Out of curiosity what does your deployment look like?
- how many APs? predominantly what models?
- MMs and MDs? number and models? Code version?
- average # of clients you are seeing on a typical day
We've had a pretty smooth start of semester here... ~1600 APs , 2x MMs & 2x MDs, 8.8.0.1, many 205H/303H's in housing, and 224's, 225's, 335's, and some 535's in acad/admin buildings.
------------------------------
Cody Ensanian
Original Message:
Sent: Aug 30, 2021 07:50 PM
From: Fred Jordan
Subject: After students come back; STM process now starved for CPU and associations failing at a very high percentage rate
Seems our First day of classes last week when the students arrived on campus we had a huge issue that lasted all day.
Seems the STM process was starved for CPU(and maybe other resources) and a very large percentage of associations were failing.
We turned off Openflow and changed some other settings and the next day was much better.
However we just found out today we were not alone; that at least one other university has had this problem since the students have returned.
Anyone else?
If you are interested, email me and I'll give you our case number and share any other information that may help you.
Thanks,
Fred