Wired Intelligent Edge

last person joined: yesterday 

Bring performance and reliability to your network with the HPE Aruba Networking Core, Aggregation, and Access layer switches. Discuss the latest features and functionality of your switching devices, and find ways to improve security across your network to bring together a mobile-first solution
Expand all | Collapse all

VSF - Split Brain Problem

This thread has been viewed 9 times
  • 1.  VSF - Split Brain Problem

    Posted Jan 31, 2019 11:26 AM

    Hey all,

     

    we configured VSF on 2 J9850A (5406 (OS 16.05.12)) switches with VSF Link on 2 10GBe SFP+ Ports connected via Multimode. They have the same VSF priority.

    VSF itself is running as we expect.

     

    But we have some problem with splitbrain in combination with lldp-mad.

     

    We use lldp-mad via singlemode to an 2930F (OS 16.05.12)

    If we disconnect the Link between Master and lldp-mad device: The master (commander) stays master. And if we now disconnect both VSF-links. it results in 2 commander with split brain.

     

    Our expectation with using a mad-device is, that it helps preventing to have 2 active commander and that the standby gets commander after the "old commander" lose its mad-uplink.

     

    Till today Firmware was: 16.05.07 - As part of Troubleshooting we updated to 16.05.12 because on Release Notes of 16.05.09 there was a fix for this kind of error (CR_0000244268).

     

    Hope someone can understand what I mean and have some tips.



  • 2.  RE: VSF - Split Brain Problem

    EMPLOYEE
    Posted Jan 31, 2019 01:11 PM

    Greetings!

     

    I would like to clarify what you're asking — is the VSF split-brain behavior you're seeing the result of disconnecting both the VSF links and the LLDP-MAD link on one of the switches?

     

    If so, then this is not really an unexpected behavior — if MAD on each member is not able to find the second member of the fabric (because the MAD link on one of them has been disconnected), the switch software operates on the assumption that the other member is down entirely, and assuming both members are otherwise operating normally, this leaves you with both members operating as the Commander of their respective fragments. 

     

    The entire purpose of MAD is to provide a means for a VSF member to discover other VSF members in the event of a failure of its VSF links and determine which resulting fragment should remain active. If the MAD link is disconnected on one or both switches, then there is no remaining method for fragment discovery, and so both fragments will assume they are supposed to be active. 

     

    Note that in addition to LLDP-MAD or VLAN-MAD, on the 5400R you have the option of using OoBM-MAD, which utilizes the out-of-band management ports to permit discovery of other fabric members in the event of a VSF link failure. It can be enabled using the following command:

     

    switch(config)# vsf oobm-mad


  • 3.  RE: VSF - Split Brain Problem

    Posted Feb 01, 2019 02:23 AM

    Hey,

     

    first: thanks for your reply.

     

    yes. we first disconnect MAD-Link from Switch A(Commander) and wait some time. -> Now i would expect, that Switch B with still active Mad-Link become Commander, but this is not happening. -> After some time, we also disconnect the VSF link, and both devices become commander.

     

    we expect, that this should not happen. I mean, how protect against physical damage of the fibre optic between the 2 locations?

    Our mad device is in a 3rd location.

     

    Because our 5406 Switches are in different locations, we are not able to connect the Oobm interfaces directly.

     



  • 4.  RE: VSF - Split Brain Problem

    MVP GURU
    Posted Feb 01, 2019 02:38 AM

    @SWE_IT wrote: Because our 5406 Switches are in different locations, we are not able to connect the Oobm interfaces directly.

    You can also connect OoBM ports indirectly each others through a 3rd (management dedicated, I suggest) switch.



  • 5.  RE: VSF - Split Brain Problem

    Posted Feb 01, 2019 03:04 AM

    @parnassus  schrieb:

    @SWE_IT wrote: Because our 5406 Switches are in different locations, we are not able to connect the Oobm interfaces directly.

    You can also connect OoBM ports indirectly each others through a 3rd (management dedicated, I suggest) switch.


    We use the 5406 as Coreswitches in our both datacenters. so they only have SFP+ Modules for 10G and no RJ45. Datacenters have Multimode between each other. 
    And taking 2 additional Switches (one per Datacenter) only for Oobm-Mad is too much. :)
    From Coreswitches there are multiple Singlemode Cables to the AL-Switches (2930F). One of them is configured as MAD-Device.


  • 6.  RE: VSF - Split Brain Problem

    EMPLOYEE
    Posted Feb 01, 2019 01:14 PM

    When a split occurs, both switches will use their MAD link to attempt to discover the other member. If the MAD link is down on one or both members, they will not be able to discover their counterpart, will both assume that they are the only remaining VSF member, and will both become the Commander of their respective fragment.

     

    For this reason, the VSF and MAD links should be run such that any component failure (other than a total failure of the switch itself) will not result in a loss of both links.