Read my lips: do not deploy vsf/irf without configuring MAD.
I rolled out several dozens of 5500HI's across 5,000 square miles thinking MAD was overkill. And it was - right up to the first time I needed it: during an ISSU firmware update. OSPF failed miserably due to duplicate routerID's - split brains were everywhere. It required a site visit to every location to manually reboot each switch in the stack. Never again...
Right now I am testing VSF 5406R's in a stack. We have ten 5406R's - two in each closet - that I want to collapse using VSF. I don't want the nightmare to repeat; thus I am labbing this configuration out and testing it.
I'm going to answer one of my questions above:
I cannot find a show command that describes whether MAD is functioning while oobm-mad is configured. The theory of MAD is to disable the switch ports when a VSF link failure results in an orphaned switch (this is good). My test was as follows:
1) Configure oobm-mad
2) Physically fault (unplug) all configured VSF links
3) Verify that MAD disables all of the orphaned switch ports
Here's what happened in the log file of the ACTIVE switch when I unplug the VSF links (logs obtained via console port):
HP-VSF-Switch# sho log
Keys: W=Warning I=Information
M=Major D=Debug E=Error
---- Event Log listing: Events Since Boot ----
I 05/10/17 00:41:48 00184 mgr: ST1-CMDR: Log cleared as a result of 'clear
logging' command
I 05/10/17 00:42:06 04992 vsf: ST1-CMDR: VSF port 1/F24 is in error state
I 05/10/17 00:42:06 04992 vsf: ST1-CMDR: VSF link 1 is down
W 05/10/17 00:42:06 03258 stacking: ST1-CMDR: Standby switch with Member ID 2
removed due to loss of communication
I 05/10/17 00:42:06 03272 stacking: ST1-CMDR: Stack fragment active
I 05/10/17 00:42:06 03271 stacking: ST1-CMDR: Topology is a Standalone
I 05/10/17 00:42:06 00077 ports: ST1-CMDR: port 2/B5 is now off-line
I 05/10/17 00:42:06 00077 ports: ST1-CMDR: port 2/B9 is now off-line
I 05/10/17 00:42:06 04992 vsf: ST1-CMDR: VSF port 1/F24 is down
I 05/10/17 00:42:06 00406 ports: ST1-CMDR: port 1/F24 xcvr hot-swap remove.
---- Bottom of Log : Events Listed = 10 ----
The log file of the STANDBY switch:
HP-VSF-Switch# sho log
Keys: W=Warning I=Information
M=Major D=Debug E=Error
---- Event Log listing: Events Since Boot ----
I 05/10/17 00:41:48 00184 mgr: Log cleared as a result of 'clear logging'
command
I 05/10/17 00:41:48 00184 mgr: Log cleared as a result of 'clear logging'
command
I 05/10/17 00:42:06 04992 vsf: VSF port 2/A8 is in error state
I 05/10/17 00:42:06 04992 vsf: VSF link 1 is down
W 05/10/17 00:42:06 03258 stacking: Commander switch with Member ID 1 removed
due to loss of communication
I 05/10/17 00:42:06 03278 stacking: Member 2 (00fd45-000000) elected as
commander. Reason: Standby takeover
W 05/10/17 00:42:06 03270 stacking: Topology is a Chain
I 05/10/17 00:42:06 03272 stacking: Stack fragment active
I 05/10/17 00:42:06 03271 stacking: Topology is a Standalone
I 05/10/17 00:42:06 04992 vsf: VSF port 2/A8 is down
I 05/10/17 00:42:06 03267 stacking: Failover occurred
I 05/10/17 00:42:06 00061 system: -----------------------------------------
I 05/10/17 00:42:06 02712 console: USB console cable disconnected
I 05/10/17 00:42:07 03272 stacking: Stack fragment inactive
I 05/10/17 00:42:07 02682 OOBM: OOBM - Enabled globally.
I 05/10/17 00:42:07 00110 telnet: telnetd service enabled
I 05/10/17 00:42:07 00110 telnet: telnetd service enabled
I 05/10/17 00:42:08 03125 mgr: Startup configuration changed by SNMP. New seq.
number 20
I 05/10/17 00:42:08 00803 usb: port enabled.
I 05/10/17 00:42:08 03401 crypto: Function POWER UP passed selftest.
I 05/10/17 00:42:08 03261 stacking: Member active
I 05/10/17 00:42:08 03260 stacking: Member booted
I 05/10/17 00:42:08 00260 system: Mgmt Module 1 Active
I 05/10/17 00:42:08 00077 ports: port 1/A1 is now off-line
I 05/10/17 00:42:08 00077 ports: port 1/B2 is now off-line
I 05/10/17 00:42:08 00077 ports: port 1/F6 is now off-line
I 05/10/17 00:42:08 00077 ports: port 2/B5 is now off-line
I 05/10/17 00:42:08 00077 ports: port 2/B9 is now off-line
I 05/10/17 00:43:16 00179 mgr: SME CONSOLE Session - MANAGER Mode
What is ambiguous is that the logs on the STANDBY switch state
Member 2 (00fd45-000000) elected as commander. Reason: Standby takeover
Both the ACTIVE and the STANDBY logs are telling me that I have two ACTIVE switches. This is not good.
Here is what was wrong with my test: I didn't have any edge devices linked up. After bringing up some L2 devices and staging the VSF failure, the ports on the STANDBY switch shutdown. When I clear the VSF fault, the STANDBY reboots and all is good.
So, it is working; however, I still do not have access to a show command that tells me the status of oobm-mad. There are show status commands for lacp-mad and lldp-mad; however, there is nothing you can see for oobm-mad.