So Friday afternoon we started out of no where having wireless connectivity issues. Upon much investigation, we found out all of our IAPs (five in cluster, four active, one monitor mode) were crashing due to kernel page fault errors. At the worst of it the IAPs were not staying online for more than five minutes. As people left on Friday, we witnessed them not rebooting anymore.
Over the weekend with only few people in the office, it was stable all weekend. Monday morning rolled around and as more people connected, it started happening again.
Got back on the phone with Aruba support and their back end team identified the problem being that 802.11k fast roaming was enabled and causing the crash. We disabled 802.11k and rebooted the cluster. Once it came back, I had people connect back up. I think disabling that feature caused the reboot loops to stop! So in case anyone else is seeing this type of problem with kernel page faults for the reboot reason (you can see this running "show version" on the CLI or More>Support then running "AP Version" on one or all of the IAPs will show you the reboot reason), try disabling 802.11k under each of your SSID security settings and reboot your cluster. You may see increased stability.
The *weird* thing is, we had been running 802.11k for about 2-3 years before this started happening. We had upgraded the firmware on the cluster to 6.5.2.0 about a month or two ago and have been fine up until Friday. So it was some weird thing that just started happening out of no where and no changes had been made.