tl;dr - VMM becomes unresponsive after a short period, and must be periodically rebooted.
The MM is running on Proxmox Linux 6.1 (KVM). We've allocated 3 vCPU cores, 6 GB of RAM, and two virtual disks of 6 GB and 8GB.
We have around 3 controllers (MDs) which are Aruba 7030 hardware appliances.
The Wifi setup isn't complex - just some WLANs on WPA2-Personal, and a WLAN on WPA2-Enterprise using the internal Auth server on the MM. We have enabled DPI, as well as WebCC (although I believe the issue reproduced without WebCC as well).
The issue seems to reproduce on both AOS 8.6.0.0 and AOS 8.5.0.5.
After a while (few hours, to a day or so), the MM appears to become unresponsive - the Web UI simply hangs when you try to access it, or the CLI hangs on commands. Several processes seem to be in state PROCESS_NOT_RESPONDING or PROCESS_NOT_RESPONDING_CRITICAL:
(ArubaMM-VA_EA_F5_7F) [mynode] #show process monitor statistics
Process Monitoring Action:Log Message
Process Monitor Statistics
--------------------------
Name State Restarts Allowed Restarts Timeout Value Timeout Chances Time Started
---- ----- ---------------- -------- ------------- --------------- ------------
/mswitch/bin/cmdexec PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:12 2020
/mswitch/bin/dbstart_pgsql PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:12 2020
/mswitch/bin/dbstart_mongo PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:13 2020
/mswitch/bin/ctrlmgmt PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:13 2020
/mswitch/bin/packet_filter PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:14 2020
/mswitch/bin/cryptoPOST PROCESS_RUNNING 8 0 240 3 Sun Jan 12 23:09:14 2020
/mswitch/bin/gsmmgr PROCESS_RUNNING 8 0 240 3 Sun Jan 12 23:09:14 2020
/mswitch/bin/pubsub PROCESS_RUNNING 8 0 240 3 Sun Jan 12 23:09:14 2020
/mswitch/bin/cfgm PROCESS_RUNNING 8 0 240 3 Sun Jan 12 23:09:14 2020
/mswitch/bin/rng-mgr PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:15 2020
/mswitch/bin/certmgr PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:15 2020
/mswitch/bin/cfgdist PROCESS_RUNNING 8 0 240 3 Sun Jan 12 23:09:15 2020
/mswitch/bin/syslogdwrap PROCESS_RUNNING 8 0 240 3 Sun Jan 12 23:09:15 2020
/mswitch/bin/aaa PROCESS_RUNNING 8 0 240 3 Sun Jan 12 23:09:15 2020
/mswitch/bin/fpapps PROCESS_RUNNING 0 0 240 3 Sun Jan 12 23:09:16 2020
/mswitch/bin/lagm PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:16 2020
/mswitch/bin/pim PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:16 2020
/mswitch/bin/licensemgr PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:16 2020
/mswitch/bin/isakmpd PROCESS_RUNNING 8 0 240 3 Sun Jan 12 23:09:16 2020
/mswitch/bin/profmgr PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:17 2020
/mswitch/bin/msghh PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:17 2020
/mswitch/bin/auth PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:17 2020
/mswitch/bin/appRF PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:17 2020
/mswitch/bin/stm PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:18 2020
/mswitch/bin/amon_sender PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:18 2020
/mswitch/bin/amon_recvr PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:18 2020
/mswitch/bin/wms PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:18 2020
/mswitch/bin/udbserver PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:18 2020
/mswitch/bin/dhcpdwrap PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:19 2020
/mswitch/bin/rsyncwrap PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:19 2020
/mswitch/bin/radvdwrap PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:19 2020
/mswitch/bin/mobileip PROCESS_RUNNING 8 0 240 3 Sun Jan 12 23:09:19 2020
/mswitch/bin/phonehome PROCESS_RUNNING 8 0 240 3 Sun Jan 12 23:09:19 2020
/mswitch/bin/hwMon PROCESS_RUNNING 8 0 240 3 Sun Jan 12 23:09:20 2020
/mswitch/bin/snmpd PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:20 2020
/mswitch/bin/trapd PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:20 2020
/mswitch/bin/ntpwrap PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:20 2020
/mswitch/bin/dbsync PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:21 2020
/mswitch/bin/slb PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:21 2020
/mswitch/bin/resolvwrap PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:21 2020
/mswitch/bin/resolv_hostname PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:21 2020
/mswitch/bin/cts PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:22 2020
/mswitch/bin/httpd_wrap PROCESS_RUNNING 8 0 240 3 Sun Jan 12 23:09:22 2020
/mswitch/bin/fw_visibility PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:23 2020
/mswitch/bin/ctamon PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:23 2020
/mswitch/bin/utild PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:23 2020
/mswitch/bin/ospf PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:24 2020
/mswitch/bin/lldpd PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:24 2020
/mswitch/bin/util_proc PROCESS_RUNNING 8 0 240 3 Sun Jan 12 23:09:24 2020
/mswitch/bin/cpsec PROCESS_NOT_RESPONDING 8 0 240 3 Sun Jan 12 23:09:24 2020
/mswitch/bin/spectrum PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:25 2020
/mswitch/bin/iapmgr PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:25 2020
/mswitch/bin/mdns PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:25 2020
/mswitch/bin/arm PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:25 2020
/mswitch/bin/mcell PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:26 2020
/mswitch/bin/dds PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:26 2020
/mswitch/bin/ipstm PROCESS_RUNNING 8 0 240 3 Sun Jan 12 23:09:26 2020
/mswitch/bin/ha_mgr PROCESS_RUNNING 8 0 240 3 Sun Jan 12 23:09:26 2020
/mswitch/bin/ucm PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:26 2020
/mswitch/bin/web_cc PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:27 2020
/mswitch/bin/cert_dwnld PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:27 2020
/mswitch/bin/sc_rep_mgr PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:27 2020
/mswitch/bin/redisdbstart PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:27 2020
/mswitch/bin/nbapistart PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:27 2020
/mswitch/bin/nbapi_helper PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:28 2020
/mswitch/bin/mon_serv PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:28 2020
/mswitch/bin/mon_serv_fwv PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:28 2020
/mswitch/bin/cluster_upg_mgr PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:28 2020
/mswitch/bin/ofc_cli_agent PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:28 2020
/mswitch/bin/splunk_zmq PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:29 2020
/mswitch/bin/topology PROCESS_NOT_RESPONDING_CRITICAL - 0 240 3 Sun Jan 12 23:09:29 2020
/mswitch/bin/topology_discovery PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:29 2020
/mswitch/bin/routing_switch PROCESS_NOT_RESPONDING_CRITICAL - 0 240 3 Sun Jan 12 23:09:29 2020
/mswitch/bin/flow_manager PROCESS_NOT_RESPONDING_CRITICAL - 0 240 3 Sun Jan 12 23:09:29 2020
/mswitch/bin/packetin_dispatcher PROCESS_NOT_RESPONDING_CRITICAL - 0 240 3 Sun Jan 12 23:09:30 2020
/mswitch/bin/event_dispatcher PROCESS_NOT_RESPONDING_CRITICAL - 0 240 3 Sun Jan 12 23:09:30 2020
/mswitch/bin/switch_manager PROCESS_NOT_RESPONDING_CRITICAL - 0 240 3 Sun Jan 12 23:09:30 2020
/mswitch/bin/upgrademgr PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:30 2020
/mswitch/bin/dpagent PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:30 2020
/mswitch/bin/mcellsolverstart PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:31 2020
/mswitch/bin/airmatch_recv PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:31 2020
/mswitch/bin/bocmgr PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:31 2020
/mswitch/bin/hcm PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:31 2020
/mswitch/bin/upl_sync_mgr PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:31 2020
/mswitch/bin/vrrp PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:32 2020
/mswitch/bin/ble_relay PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:32 2020
/mswitch/bin/dot1x1 PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:32 2020
/mswitch/bin/im_helper PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:32 2020
/mswitch/bin/user_visibility PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:32 2020
/mswitch/bin/impystart PROCESS_RUNNING - 0 240 3 Sun Jan 12 23:09:33 2020
We have collected tech support logs before it gets to the hanging state - but we can't during, since the Web UI hangs, and also, from the CLI, even if you run "show techsupport", it hangs after a certain point:
Aruba TAC seems stumped, and it's been a couple weeks - it's driving us bonkers.