Wireless Access

 View Only
Expand all | Collapse all

After students come back; STM process now starved for CPU and associations failing at a very high percentage rate

This thread has been viewed 130 times
  • 1.  After students come back; STM process now starved for CPU and associations failing at a very high percentage rate

    Posted Aug 30, 2021 07:51 PM

    Seems our First day of classes last week when the students arrived on campus we had a huge issue that lasted all day.

    Seems the STM process was starved for CPU(and maybe other resources) and a very large percentage of associations were failing.

    We turned off Openflow and changed some other settings and the next day was much better.

    However we just found out today we were not alone; that at least one other university has had this problem since the students have returned.

    Anyone else?

    If you are interested, email me and I'll give you our case number and share any other information that may help you.

    Thanks,

                    Fred



  • 2.  RE: After students come back; STM process now starved for CPU and associations failing at a very high percentage rate

    Posted Aug 31, 2021 09:45 AM
    Out of curiosity what does your deployment look like?
    - how many APs? predominantly what models?
    - MMs and MDs? number and models? Code version?
    - average # of clients you are seeing on a typical day

    We've had a pretty smooth start of semester here... ~1600 APs , 2x MMs & 2x MDs, 8.8.0.1, many 205H/303H's in housing, and 224's, 225's, 335's, and some 535's in acad/admin buildings.​


  • 3.  RE: After students come back; STM process now starved for CPU and associations failing at a very high percentage rate

    Posted Aug 31, 2021 10:51 AM

    3400 APs; mostly  325's, 105's, 225's345,5, 275's in that order and a few others; ~22000 max concurrent users slightly less is good average; 2x MM's 3xMDs 8.6.0.9.

    And our problem is resolved, but we do not prefer having to turn off openflow; we hear it is needed for Air Group, and maybe other advantages.

     

     






  • 4.  RE: After students come back; STM process now starved for CPU and associations failing at a very high percentage rate

    Posted Sep 01, 2021 07:30 AM
    What ArubaOS version?

    AP-105 end of support was August 2020. maximum AOS 8.6.

    ------------------------------
    Bruce Osborne ACCP ACMP
    Liberty University

    The views expressed here are my personal views and not those of my employer
    ------------------------------



  • 5.  RE: After students come back; STM process now starved for CPU and associations failing at a very high percentage rate

    Posted Sep 01, 2021 07:55 AM

    Running 8.6.0.9 and we are currently replacing the AP-105's; but you are correct they have reached end of support.

    Again, one other University contacted me with the same issue, so I was only wanting to let others know

    Aruba is still researching this issue, and we have some workarounds in place now so we are happy with the current situation.

    Thanks,

                    Fred






  • 6.  RE: After students come back; STM process now starved for CPU and associations failing at a very high percentage rate

    Posted Sep 01, 2021 10:49 AM
    Thanks for sharing Fred. We ran into the STM CPU spike as well on the first week of school. The users can't connect to the SSID, longer time to auth, connection drop, and the spinning wheel...etc. The case is opened with TAC and so far they're looking into it, only recommendation at this point is to remove Airwave pulling on the Controllers. Aruba EE didn't ask to disable OpenFlow in my case. Our Cpu still spike in 2MDs out of 3. and STM drop showed a little on #1 and #2 MD's controller.

    Just curious, what else did they change in your environment other than disabled the OpenFlow? and the OpenFlow is at the VAP Profile right?

    Our Env : 2600 APs, 535's, 375's, 365's, 335's, 334's, 315's, 275's, 224's ,225's.  2xMMs, 3xMDs main cluster, 2xMD another cluster (no APs on this one). Code is 8.6.0.9

    Kevin
    ​​

    ------------------------------
    KEVIN DIEP
    ------------------------------



  • 7.  RE: After students come back; STM process now starved for CPU and associations failing at a very high percentage rate

    Posted Sep 01, 2021 04:50 PM

    Reducing syslog logging levels and adjusting SNMP polling intervals and there were some broadcast/multicast settings they changed as well.

    I'm not 100% sure about exactly how OpenFlow was disabled; I just know that it was.

     

    And you guys know there is a whole thread that talks about some of this on the educause WIRELESS-LAN listserv below.

                   

    http://listserv.educause.edu/archives/images/b-thread.png

    [External] Re: [WIRELESS-LAN] Anyone else seeing any issues in the fall with large classrooms and delayed connection times (Aruba 8.5.0.13)

     

     






  • 8.  RE: After students come back; STM process now starved for CPU and associations failing at a very high percentage rate

    Posted Sep 02, 2021 04:01 AM
    It seems that Aruba has found something. For customers who can log in to the Aruba Support Portal: Aruba Support Advisory. It's classified for Customers and Partners only, so can't paste the contents here. Your partner or Aruba Support can give you access to it, if needed.

    ------------------------------
    Herman Robers
    ------------------------
    If you have urgent issues, always contact your Aruba partner, distributor, or Aruba TAC Support. Check https://www.arubanetworks.com/support-services/contact-support/ for how to contact Aruba TAC. Any opinions expressed here are solely my own and not necessarily that of Hewlett Packard Enterprise or Aruba Networks.

    In case your problem is solved, please invest the time to post a follow-up with the information on how you solved it. Others can benefit from that.
    ------------------------------



  • 9.  RE: After students come back; STM process now starved for CPU and associations failing at a very high percentage rate

    Posted Sep 02, 2021 02:17 PM
    Does anyone know if this manifested itself as "lagging traffic" or "ping spikes". The field communication sure sounds like our issue and would love to try it. However, the end of the article says "This is a service impacting event but is necessary for STM process to start functioning properly again. " Anyone know what exactly happens when restarting that process? Is it something like all APs reboot or are we talking just a few dropped packets?

    ------------------------------
    Michael Naylor
    ------------------------------



  • 10.  RE: After students come back; STM process now starved for CPU and associations failing at a very high percentage rate

    Posted Sep 02, 2021 03:01 PM
    Edited by cjoseph Sep 02, 2021 03:03 PM
    APs will bootstrap and users will notice.  

    I would use the show commands in the advisory under "How to know if you are impacted" to ensure that is what you are experiencing before restarting STM.

    ------------------------------
    Any opinions expressed here are solely my own and not necessarily that of Hewlett Packard Enterprise or Aruba Networks.
    ------------------------------



  • 11.  RE: After students come back; STM process now starved for CPU and associations failing at a very high percentage rate

    Posted Aug 28, 2024 07:47 AM

    I posted this same message under another discussion but meant to place it here.  This is the message: 

    We are experiencing high CPU spikes and connectivity issues, similar to what has been discussed in this thread. This started on the first day of school, despite no major network changes except upgrading our controllers to 8.10.0.12 at the beginning of the summer. The issue became apparent with 42,000 clients on the controllers, especially during class changes when about 90% of our devices are Apple products. Enabling 802.11R seemed to help, but the issue persists during class changes, affecting clients campus-wide due to over-prescribed controllers.

    Initially, we suspected COPP on our Cisco 9K switches, but it turned out not to be the cause, as COPP was policing traffic when the connectivity issues began. Our network has come to a complete standstill twice during the day. We've been working with TAC for several days now. We are using MM/MC with nine 7240 XM controllers with memory upgrades and have 6,000 APs (315, 503H, 535, 515, 375, 275, 277, 377, 325, and 215). We are in the process of replacing the 2xx series, with about 40 left on the network.

    TAC is following the same troubleshooting steps mentioned in this thread. While CPU spikes are concerning, the primary issue is clients being unable to connect to the wireless network, severely disrupting classes. The ACE team might be called in, but we won't know until after class changes this morning. We've always seen CPU spikes in processes, but the CPU as a whole doesn't get overwhelmed.




  • 12.  RE: After students come back; STM process now starved for CPU and associations failing at a very high percentage rate

    Posted Aug 28, 2024 07:59 AM

    There were a LOT of bugs fixed in 8.10.0.13. I would recommend patching before further troubleshooting.

    After moving to patch 13, we had a lot fewer AP bootstraps, for instance.



    ------------------------------
    Bruce Osborne ACCP ACMP
    Liberty University

    The views expressed here are my personal views and not those of my employer
    ------------------------------



  • 13.  RE: After students come back; STM process now starved for CPU and associations failing at a very high percentage rate

    Posted Sep 06, 2021 11:16 AM
    As a Partner, I could say that recently we have been noticed about a STM bug and associations issues. It is a document for partners and customers, ask your local partner or aruba support team about it ARUBA-SA-20210901-PLVL04

    ------------------------------
    Yury Morales
    ------------------------------