Hello,
I'm writing to try and get some additional ideas from the community.
We have (2) 7210 controllers (primary and master) and (3) 7240 local controllers where clients and WAPs (225,135,105) terminate. We got alerts from airwave that masses of WAPs at different locations were going down (I *think* this may have quietly been going on since December and has progressively gotten worse, I don't have solid evidence but a hunch). I opened a case with TAC in July. Various things, one person saw high STM process (104% CPU) and HTTPD process too. One said he saw WAPs had been up 44 days yet the radios had rebooted recently. I would montior cpuload and it seemed very high. We were on version 6.4.4.4 and they recommended going to 6.4.4.9. Completed that upgrade and a few days later. Only issue had during the upgrade is that it took a couple hours for about the last 100 WAPs to fully register (they kept bootstrapping but eventually everything upgraded and seemed good). Now we are losing console even to local controllers, one is down completely (no ip connectivity, no console), the other pings but no console or ssh (beyond the login and password just sits, never gets you to prompt), the last one pings but no console or ssh either. TAC had me pull the uplink to the network to see if that was impacting the console. No change when removed the uplink, still no console. I ended up hard booting all 3 controllers to restore services. And here we are 12 hours later and I can't console into one of the controllers. I can't ssh all the way in to 2 of them. The HTTPD process seems high and I'm worried.
Examples:
Tasks: 179 total, 2 running, 177 sleeping, 0 stopped, 0 zombie
Cpu(s): 25.9%us, 19.0%sy, 0.0%ni, 52.8%id, 0.0%wa, 0.0%hi, 2.3%si, 0.0%st
Mem: 5172096k total, 3087680k used, 2084416k free, 12928k buffers
Swap: 0k total, 0k used, 0k free, 905088k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
19017 nobody 20 0 292m 24m 7232 S 122 0.5 0:05.00 httpd (122% CPU??)
3950 root 20 0 711m 479m 77m S 15 9.5 277:45.38 stm
3
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3222 nobody 20 0 0 0 0 Z 9999 0.0 0:08.36 httpd <defunct> (9999% cpu and defunct??)
3824 nobody 20 0 310m 24m 7168 S 58 0.5 0:03.06 httpd
3747 root 20 0 356m 284m 281m S 34 5.6 2831:21 gsmmgr
3950 root 20 0 727m 516m 98m S 14 10.2 1543:38 stm
4111 root 20 0 271m 134m 48m S 11 2.7 102:13.90 arm
Thanks for any pointers,
Sarah