I was able to pull the tail end of course, when the power was restored to the host. Prior to that the log is basically just the same cfgm, snmp and ntpwrap warnings and errors on repeat.
Aug 2 08:36:17 cfgm[3531]: <399838> <3531> <WARN> |cfgm| LmsHeartBeatResultAction: State(READY:LAST SNAPSHOT:CFGID-45:PEND-0:INITCFGID:0) FD=33:Cannot heartbeat with the master.
Aug 2 08:36:17 cfgm[3531]: <399816> <3531> <ERRS> |cfgm| handle_read: State(READY:LAST SNAPSHOT:CFGID-45:PEND-0:INITCFGID:0) FD=33:Failure receiving heartbeat response header information Result=-1 Err=Connection timed out
Aug 2 08:36:19 ntpwrap[3806]: <399816> <3806> <ERRS> |ntpwrap| ntpdPollingTimer:926 Listen address unavailable,restart Ntpd Daemon.
Aug 2 08:36:26 cfgm[3531]: <399838> <3531> <WARN> |cfgm| LmsHeartBeatResultAction: State(CONNECTINPROGRESS:LAST SNAPSHOT:CFGID-45:PEND-0:INITCFGID:0) FD=33:Cannot heartbeat with the master.
Aug 2 08:36:33 snmp[3803]: <301250> <3803> <ERRS> |snmp| Sendto failed, unable to send trap to manager 10.85.18.75:162.
Aug 2 08:36:34 ntpwrap[3806]: <399816> <3806> <ERRS> |ntpwrap| getNtpSrvRouteAddr:447:connect() failed
Aug 2 08:36:35 cfgm[3531]: <399838> <3531> <WARN> |cfgm| LmsHeartBeatResultAction: State(READY:LAST SNAPSHOT:CFGID-45:PEND-0:INITCFGID:0) FD=33:Cannot heartbeat with the master.
Aug 2 08:36:35 fpapps[3564]: <399838> <4287> <WARN> |fpapps| updateUplinkReachState: Wired uplink vlan 4 (bkp NO) reachability (to 10.85.18.150) changed from 2(DOWN) to 1(UP). mode 1. lb-state: ENABLED
Aug 2 08:36:35 fpapps[3564]: <399838> <4287> <WARN> |fpapps| ipMapAddUplinkDefaultGateway: Adding static vlan 4 gateway 10.85.4.1
Aug 2 08:36:39 cluster_mgr[4237]: <352302> <5592> <ERRS> |cluster_mgr| cluster_proc_dds_peer_channel_add_event, peer_index for peer 10.85.18.150 is 0!
------------------------------
Matthew Waite
------------------------------
Original Message:
Sent: Aug 03, 2021 12:12 PM
From: Colin Joseph
Subject: AOS 8.7 - MDs refusing communication when MM offline?
Type "show log system all" on the MD to get a clue why
------------------------------
Any opinions expressed here are solely my own and not necessarily that of Hewlett Packard Enterprise or Aruba Networks.
Original Message:
Sent: Aug 02, 2021 03:19 PM
From: Matthew Waite
Subject: AOS 8.7 - MDs refusing communication when MM offline?
Good afternoon,
One of our remote sites has a two 7210 controllers, setup as MDs in a cluster on a locally hosted single MM (all running 8.7), which has been running well for months, no issues.
Recently the EXSi host that MM is hosted on ran into a power failure and shut down. The controllers remained powered up, but when the MM lost power, the MDs stopped responding to pings, SSH, HTTPS, etc. Looking in the local router's ARP table, I still saw they were there however. No one at the site complained of the wireless being down, so perhaps the APs and controllers were still communicating.
Once I powered the EXSi host back on and MM powered back on, the MDs immediately became reachable again and everything appears to be working as normal.
I've never run into this situation before - is it normal for the MDs to refuse communications when the MM is unreachable? Is there something I can change in the configuration to prevent this from happening, so we can at least log into the MDs to check status? We are running a fairly vanilla setup.
Thanks in advance for any advice.
------------------------------
Matthew Waite
------------------------------