I have been running into an intermittent issue with our remote sites and the AP's locking up from high CPU utilization and I was hoping for some advice. This issue is slowly creeping across our network and only hitting one or two sites at a time. We are running one or two RAP-109s at each site with each site having its own virtual controller. The RAPs are managed via Airwave.


The issue begins with one of the APs reporting as "Down" in Airwave and the event log reporting "AP is not associated to Virtual Controller". Troubleshooting reveals that the AP(s) are running at 100% CPU utilization with the "top" and "mini_httpd" processes consuming most of the resources. Running commands such as "show tech-support" and similar result in “Module AM is busy.  Please try later.” Client IP addresses are and noise level on the AP is listed as high. 


This is what doesn't make sense to me. After 30-45 minutes, the CPU utilization will normalize, both APs will join the virtual controller, and clients will connect and pass traffic. Powercycling the APs does not immediately correct the issue as it appears they boot back up to 100% CPU until this "process" completes. 


Are they also having this issue outside business hours?


Could be something like an anti virus on client PC acting funny (had that issue a few times before and made the APs unstable)

Yes, it does seem to happen after business hours based on the event log in Airwave. Excellent idea! I will definitely vet out the antivirus. It is so frustrating that some basic show commands aren't working during this period so it is difficult to adequately troubleshoot. 


Just to confirm something I mentioned in my original post, I bounced the port providing PoE to both of the APs at one site during one of these events and they booted right back up to 100% CPU for ~20 minutes. It definitely doesn't seem like a bug on the IAP could cause that type of behavior...

Thought I should follow up and let everyone know that I believe I have discovered the issue. Luckily, the show datapath sessions command works when the CPU is spiked and revealed that a port scan utility was opening over 200 sessions on port 443 which caused the AP to lock up. This explains why the issue was affecting certain regions at a time and a reboot would not resolve the issue - the scan started right back up when the device was reachable. 

