Wired Intelligent Edge

 View Only
Expand all | Collapse all

5406R in VSF crashes with spanning tree ring connected.

This thread has been viewed 2 times
  • 1.  5406R in VSF crashes with spanning tree ring connected.

    Posted Aug 02, 2020 03:35 AM

    Hi

     

    Had a nasty surprise yesterday. We have two 5406Rs connected to each other in VSF. Then we have a string of old Cisco switches daisy-chained to each other.

     

    We connected it like this:

     

    vsf stp.jpg

    (sorry for the rotation, it looked okay on my pc before upload)

     

    When I completed the ring, one of the 5406's rebooted and came back up with a split brain. We had a hell of a time bringing VSF back up to normal.

     

    My guess is some looping occured even though spanning tree should have taken care of that. The VSF links are on a different line card. Uplinks to Cisco switches are 1 gig, VSF link is 10 gig.

     

    Both Aruba and Cisco switches are running Rapid PVST. Arubas running software version 16.09.0009

     

    Is this an unsupported configuration, or did I do something wrong?

     

     

    Lars



  • 2.  RE: 5406R in VSF crashes with spanning tree ring connected.

    Posted Aug 02, 2020 07:31 PM

    Hi! technically speaking a properly configured VSF (and RSTP on all involved switches) shouldn't have suffered of a Split-Brain because, at first sight, the loop of chained Cisco switches should have been simply cut by RSTP intervention...out of curiosity:how MAD was configured on VSF?

     

    Can you provide some other information about your VSF and Cisco setup?

     

    It would be interesting to understand IF what caused the VSF Split Brain (all VSF Link's members down) could be a consequence of the looped chain of Cisco Switches...



  • 3.  RE: 5406R in VSF crashes with spanning tree ring connected.

    Posted Aug 03, 2020 02:24 AM

    Hi Parnassus. Thank you for the quick reply.

     

    RPVST is configured on both 5406's (they share a management plane, so it's the same configuration). We don't run MAD.

     

    Funny thing, it worked when we connected both ends of the rings on the same 5406R.



  • 4.  RE: 5406R in VSF crashes with spanning tree ring connected.

    Posted Aug 03, 2020 02:50 AM

    Not quite funny...VSF is a virtualization technology and you should have a single logical entity presenting itself to 3rd party peers (and that is valid for the Spanning Tree too)...so you should see no different behaviors in case you connect your loop back to VSF Member 1 or VSF Member 2 since both physical members are, de-facto, acting a single logical switch.

     

    Would be interesting to see how exactly RSTP was configured (spanning-tree root-guard too) on all involved switches.

     

    VSF MAD: deploying a properly configured VSF MAD mechanism is quite important if not essential (please use, at least, MAD over OoBM port if you don't want to waste one port per VSF member to deploy VLAN/LLDP MAD...the MAD over OoBM approach could be deployed back-to-back - directly patching both OoBM ports - or indirectly by using a 3rd party switch that should specifically dedicated to OoBM network).



  • 5.  RE: 5406R in VSF crashes with spanning tree ring connected.

    Posted Aug 03, 2020 03:18 AM

    Hi Parnassus. You're right - it wasn't funny. This was a production site and because of the flood of traffic resulting from the loop, it threw a lot of equipment off the network. We scrambled for the next five hours to get it back up and save the running production.

     

    Spanning tree config on the 5400s:

     

    spanning-tree
    spanning-tree mode rapid-pvst
    spanning-tree vlan 1 root primary
    spanning-tree vlan 2 root primary
    spanning-tree vlan 4 root primary

     

    There's a bunch more vlans. The 5400s are root for all of them.

     

    Spanning tree on the Ciscos:

    spanning-tree mode pvst
    no spanning-tree optimize bpdu transmission
    spanning-tree extend system-id


  • 6.  RE: 5406R in VSF crashes with spanning tree ring connected.

    Posted Aug 03, 2020 06:14 AM

    Why using the rapid-pvst instead of the rstp considering that - with information provided - the VSF looks to be the STP root for any of your VLANs?



  • 7.  RE: 5406R in VSF crashes with spanning tree ring connected.

    Posted Aug 03, 2020 07:16 AM

    We can't change anything on the Cisco side, and they run Rapid PVST.

     

    Anyways, it doesn't make any difference as to why it works when connecting the loop to the same VSF member.

     

    Looking at the logs from the crash, it seems we were hit by this bug: https://support.hpe.com/hpesc/public/docDisplay?docId=mmr_kc-0133742



  • 8.  RE: 5406R in VSF crashes with spanning tree ring connected.

    Posted Aug 03, 2020 07:55 AM

    Wait...what software version is your VSF currently running on? I supposed it was safely kept updated ...isn't it?



  • 9.  RE: 5406R in VSF crashes with spanning tree ring connected.

    Posted Aug 03, 2020 08:00 AM

    Hi Parnassus

     

    We're on 16.09.0009. Not brand spanking new, but this is the one we're running everywhere and has been stable for us.



  • 10.  RE: 5406R in VSF crashes with spanning tree ring connected.

    Posted Aug 03, 2020 08:14 AM

    And so the KB article you referenced is not relevant since it refers to really older software versions (some of them didn't support VSF at all)...if the logged message is really the same it could really be that its root cause is totally different.



  • 11.  RE: 5406R in VSF crashes with spanning tree ring connected.

    Posted Aug 12, 2020 10:09 AM

    I have dug out the logs from the crash. They are in reverse order (for some reason they go way longer back than the ones in chronological order), so you need to start reading from the bottom and work your way up.

     

    The first line at the bottom is where we plugged in the other end of a ring of switches.

     

     

    I 08/01/20 17:41:36 04992 vsf: ST1-CMDR: VSF port 1/E8 is down
    I 08/01/20 17:41:36 04992 vsf: ST1-CMDR: VSF link 1 is down
    I 08/01/20 17:41:36 04992 vsf: ST1-CMDR: VSF port 1/E8 is in error state
    W 08/01/20 17:41:36 00374 chassis: ST1-CMDR: Slot 2/E: Out of Order Msg loss -
                exp seq # 0
    
    ## Again, tons of these.
    
                exp seq # 0
    W 08/01/20 17:41:30 00374 chassis: ST1-CMDR: Slot 2/E: Out of Order Msg loss -
                exp seq # 0
    I 08/01/20 17:41:30 00077 ports: ST1-CMDR: port 2/F8 is now off-line
    I 08/01/20 17:41:30 00077 ports: ST1-CMDR: port 2/F4 is now off-line
    I 08/01/20 17:41:30 00077 ports: ST1-CMDR: port 2/F3 is now off-line
    I 08/01/20 17:41:30 00077 ports: ST1-CMDR: port 2/F2 is now off-line
    I 08/01/20 17:41:30 00077 ports: ST1-CMDR: port 2/F1 is now off-line
    I 08/01/20 17:41:30 00077 ports: ST1-CMDR: port 2/A2 is now off-line
    I 08/01/20 17:41:30 00077 ports: ST1-CMDR: port 2/A1 in Trk101 is now off-line
    I 08/01/20 17:41:30 03271 stacking: ST1-CMDR: Topology is a Standalone
    I 08/01/20 17:41:30 03272 stacking: ST1-CMDR: Stack fragment active
    W 08/01/20 17:41:30 03258 stacking: ST1-CMDR: Standby switch with Member ID 2
                removed due to loss of communication
    W 08/01/20 17:41:29 00374 chassis: ST1-CMDR: Slot 2/E: Out of Order Msg loss -
                exp seq # 0
    W 08/01/20 17:41:29 00374 chassis: ST1-CMDR: Slot 2/E: Out of Order Msg loss 
    
    ## Tons of these repeated.
    
                exp seq # 0
    W 08/01/20 17:41:11 00374 chassis: ST1-CMDR: Slot 2/E: Out of Order Msg loss -
                exp seq # 0
    W 08/01/20 17:41:11 00374 chassis: ST1-CMDR: Slot 2/E: Out of Order Msg loss -
                exp seq # 28836
    I 08/01/20 17:41:11 00077 ports: ST1-CMDR: port 2/E7 is now off-line
    I 08/01/20 17:41:11 00077 ports: ST1-CMDR: port 2/E6 is now off-line
    I 08/01/20 17:41:11 00077 ports: ST1-CMDR: port 2/E5 is now off-line
    I 08/01/20 17:41:11 00077 ports: ST1-CMDR: port 2/E4 is now off-line
    I 08/01/20 17:41:11 00077 ports: ST1-CMDR: port 2/E3 is now off-line
    I 08/01/20 17:41:11 00077 ports: ST1-CMDR: port 2/E2 is now off-line
    I 08/01/20 17:41:11 00077 ports: ST1-CMDR: port 2/E1 is now off-line
    W 08/01/20 17:41:11 03839 chassis: ST1-CMDR: Slot 2/E: Lost Communications
                detected - Source Message System(40)
    W 08/01/20 17:41:08 05737 FFI: ST1-CMDR: port 2/E8-Excessive Multicasts. See
                help.
    W 08/01/20 17:41:08 00332 FFI: ST1-CMDR: port 2/E8-Excessive Broadcasts. See
                help.
    W 08/01/20 17:40:53 00516 IpAddrMgr: ST1-CMDR: IPAM Control task delayed due to
                slave message queues too full
    W 08/01/20 17:40:45 05737 FFI: ST1-CMDR: port 1/E2-Excessive Multicasts. See
                help.
    W 08/01/20 17:40:45 00332 FFI: ST1-CMDR: port 1/E2-Excessive Broadcasts. See
                help.
    W 08/01/20 17:40:41 02581 ip: ST1-CMDR: IPv4: Duplicate IPv4 address 10.13.1.1
                is detected on port 1/E2 in VLAN 1 with a MAC address of
                f860f0-fe743f
    I 08/01/20 17:40:12 03816 stp: ST1-CMDR: VLAN 510 - Root changed from 32768:
                58bc27-178580 to 32768: 1caa07-341480
    I 08/01/20 17:40:07 00076 ports: ST1-CMDR: port 1/E2 is now on-line