Resource 'Controlpath CPU' has exceeded 30% threshold (actual: X%)

‎07-02-2014 03:47 PM
Question   What are the following logs “ Resource 'Controlpath CPU' has exceeded 30% threshold (actual: X%) mean on controller running 6.2.x code?
Does it mean controller running into High cpu usage?


This article applies to ArubaOS 6.2.x

process monitor log

Aug 21 12:09:40 fpapps[1823]: <399838> <WARN> <CSU-Aruba-3>  Resource 'Datapath CPU 9' has exceeded  30%  threshold (actual:49%).
Aug 21 12:10:42 fpapps[1823]: <399838> <WARN> <CSU-Aruba-3>  Resource 'Datapath CPU 9' has dropped below  30%  threshold (actual:0%).
Aug 21 12:17:53 nanny[1753]: <399838> <WARN> <CSU-Aruba-3>  Resource 'Controlpath CPU' has exceeded  30%  threshold (actual:32%).
Aug 21 12:18:53 nanny[1753]: <399838> <WARN> <CSU-Aruba-3>  Resource 'Controlpath CPU' has dropped below  30%  threshold (actual:11%).
Aug 21 12:22:53 nanny[1753]: <399838> <WARN> <CSU-Aruba-3>  Resource 'Controlpath CPU' has exceeded  30%  threshold (actual:34%).
The log messages were introduced on 6.2.x code stream this is in order to bring visibility into controller thresholds, and the threshold message indicates the high CPU utilization  however having high CPU utilization at times is acceptable during peak hours
Let’s say during Morning, when lot of users come online,  conference rooms  high CPU utilization  is expected .
Only if we are having the CPU utilization high for longer duration of time, we might  need to look at which processes are taking CPU cycles by collecting CPU related outputs,
#show cpuload
#show cpuload current
#show processes sort-by CPU
#show process monitor statistics
#show memory
#show storage


I am seeing the same SNMP message coming from our monitoring:

Is there any way to determine why these messages are received when they just appear intermittently and perhaps during short time spans?

These alarms appear intermittently and the CPU does not appear to stay over the threshold for longer periods. 

The issue tends to happen during the morning hours and in some cases afternoons.

SNMPTRAP-aruba-WLSX-TRAP-wlsxThresholdExceededAruba: Particular resource under monitoring has gone above the threshold specified - (0) - (Datapath CPU 10) - (30) - (58).; Object=SomeController; IP=xxxxxx; Subobject=0Datapath CPU 103058; Class=BreakFix

It can depend on a lot of factors like which ArubaOs version are you using, which controller is in place and how much load it has, how many users are connected to the network at peak times and so on.

I would recommend opening a TAC case. I have seen in the past such issues are usually due to an OS bug and TAC has better resources/information to give you a better solution.


Thank you for the reply.

I already have checked all the factors you mentioned above and there is no apperant reason as to why we should receive a threshold exeeded message. That being said I actually do believe this could be a bug related issue. The version of this one is: (MODEL: Aruba7210), Version and I am working on opening a TAC to see what the support team has to say about this. But meanwhile I wanted to see if there is anything i can do to speed up the troubleshooting :).

If anything I wonder whty these values are so low (or maybe 30% resp 45% is how it should be configured). Anyway all the controller under our support are configured with the same precentage.  Looking at these values makes me conclude we are way below the threasholds.

--------            ------------
Datapath CPU        30 %
Controlpath CPU     45 %


Threshold Values for Number of APs
Default(%)  Current(%)  Max APs  Avail. CAPS  Avail. RAPS  Current Tot Aps  Current CAPS  Current RAPs  Current VAPs
----------  ----------  -------  -----------  -----------  ---------------  ------------  ------------  ------------
80          80          293      75           293          218              218           0             1386


Threshold Values For No of Users
Default(%)  Current(%)  MAX Users  Current Users
----------  ----------  ---------  -------------
80          80          16384      1001


This could exceed 30 % during high load as you suggested. But it will be intermittent and momentarily and very hard to catch, in a debug.

Datapath Network Processor Utilization
      | Cpu utilization during past  |
  Cpu |  1 Sec     4 Secs    64 Secs |
    8 |      0% |      0% |       0% |
    9 |      3% |      3% |       3% |
   10 |      6% |      7% |       9% |
   11 |      8% |      5% |       6% |
   12 |      6% |      5% |       5% |
   13 |      7% |     11% |      15% | 
   14 |      5% |      5% |       6% |
   15 |      4% |      5% |       7% |

Is this a frequent problem?

