hi Koen
Debugging a single core is hard - high core CPU (anything > 7 is not related to the CLI/arubaos itself) is generally down to what a single user might be up to, or, a single user is the receipient of something that is coming from the network. Examples that can cause this are a single user flooding broadcast packets out, heavy subnet scanning or perhaps the user has arp spoofed the default gateway and is now sinking all the traffic.
Having a core at 100% for any sustained amount of time will usually impact others using that core (latency, aaa delays, packetloss) - there is no easy way to determine what tunnels/users are on that core, so you need to make some deductions from various datapath debug commands to try and narrow down the type of traffic that may be the cause (and then hopefully you can deduce where it's coming from)
Some example commands to use during the issue which can also be used outside the issue to baseline - note that most of these are in a techsupport log but it's also important to see the value deltas when running them during the issue. Please determine the risk/output volume before running these (e.g. show datapath user is voluminous)
> find the cpu core, is it holding 99% util across all three time buckets? Are any of the CPUs in the range 8-11 also showing an uptick during the issue ?
show datapath utilization
> run twice with 5 seconds in between to collect delta. Look for excessive drops, flood frames, anything that looks bad.
show datapath frame verbose
> collect some stats from the affected cpu (e.g. 27), run twice (at least) with 5 seconds in between. Look for 'Allocated Frames' vs. 'max Allocated Frames'. Ignore the 'discard' stats, check for high rates of flood frames during the issue.
show datapath frame 27
> collect the errors (mostly a subet of the previous output)
show datapath error counters
> if there is an uptick in cpus 8-11 then see if any of the opcodes in the following are increasing more rapidly than usual. Collect this twice, 5 seconds apart - it can help identify a particular type of traffic that may be causing the high CPU.
show datapath message-queue counters
> depending if you have BWM contracts, which in themselves shouldnt cause high CPU, always a good idea to collect this twice, 5 seconds apart to ascertain how BWM is holding up
show datapath maintenance counters
> look for any users consuming a massive number of session counters (indicative of port scanning etc.). Warning - very voluminous output
show datapath user
show datapath user counters
show datapath user 27
> Also unfortunately a lengthy output - if all of the above fail to give a clue, you may need to try and narrow down the AP that hosts the user which is causing the high CPU. It's a bit manual, but you need to run the following command a few times, to try and visually identify which tunnel is consuming a large amount of traffic. In the case of the high CPU, one of them usually sticks out.
show datapath tunnel verbose
If you find one that sticks out, then you can find the BSSID and hence dump the user table and datapath user and filter against that BSSID to narrow down who is on the AP - later when the issue happens again, you can repeat and compare if the same user(s) are back. You can also start to inspect the 'show datapath session table <ip of user>' to see what they are doing or dump the whole 'show datapath session table' and filter it against the tunnel id.
hope that's useful.