05-24-2017 01:47 AM
i have an issue this morning. 5 sites, each has two units of 7210. the issue occur on one of the site with two local 7210.
the issue starts with users on the whole area with APs adopted to those controller cannot connect to the SSIDs. the controller's web GUI cannot be accessed (very slow, always loading), SSH can be access fine but sometime shows "Module STM is busy" when i try to run something.
i try to access second 7210 which can be opened fine unlike the first controller. I then remove the first controller from the network to force all APs to failover to the second controller. then all clients can continue to work well.
i try to check cpuload it says process which stm command using 100% of the CPU.
now the network work well with the second controller. when we try to connect the first controller back, the problem re-occur. even after a reboot.
i need guidance what to do next?
PS: attached the result from "show cpuload" and "show cpuload current"
CWNA | ACMP | ACCP
05-24-2017 04:13 AM
That could be a symptom, but you need to open a TAC case so that they can collect and analyze your logs and a possible crash.tar It is hard to know what is wrong with the limited information that you can print here.
Aruba Customer Engineering
Looking for an Answer? Search the Community Knowledge Base Here: Community Knowledge Base
05-24-2017 10:20 AM
We had a similar issue and had a tac case opened for many months while things were being analyzed. In a nut shell our 7240 controllers weren't cleaning up the data sessions and our controllers would crash, very similar to what you're reporting. We would hard boot the controllers and things would work okay for a while, but then they would lock up again. They first wrote a customer specific AOS build for us addressing the issue, then included the fix starting in code 188.8.131.52
Not sure if the same thing exactly as you are reporting but I agree, a TAC case would be best.
a month ago
opened a TAC case. there is a bug in 6.5.x with the communication between controller and airwave using AMON. we detected a looping packets between them, causes stm process in the controller to overload.
if you are experiencing this issue, the quick fix is to disable amon in the controller config for the airwave. currently no fix for this issue, the latest AOS when i try this is 184.108.40.206.
CWNA | ACMP | ACCP