11-03-2013 08:32 AM
We're testing out Aruba switches (specifically the 2500 series). We've stacked together two switches and have them uplinked to our Cisco Nexus core. One uplink originates from one stack member, the other uplink is on the second stack member. The uplinks are bound in a port channel (lag) group..
We did some failover testing and failures on the stack cables and uplinks caused no network disruption. However, when we physically failed either member of the stack (primary or secondary), the traffic on the switch that remained up experienced a significant (ie >40 seconds) disruption. In our current Cisco environment, and with HP switches we're testing, this is not the case.
Anyone know how to mitigate this?
11-03-2013 12:07 PM - edited 11-03-2013 12:08 PM
Not a clue right now.
But if you purshased the switches with their respective support, go and open a support case. They will asnwer you faster than people will do here.
If madjali is around well he might be able to asnwer you.
Could be a bug though... did you check the release notes and upgrade the firmware of those switches?
Product Manager - Aruba Networks
11-03-2013 02:56 PM - edited 11-04-2013 06:21 AM
While there is currently support for configuration syncronization between all members of the Arubastack, we do not currently support protocol syncronization. As such when the primary fails, the secondary has to re-learn things like Spanning Tree, OSPF routes, etc which will lead to a temporary impact on forwarding. If this is a L2 configuration, you can currently mitigate this by disabling Spanning Tree and enabling loop-protect to minimize the impact of STP re-sync. With respect to the port-channel, are you using a static LAG or a dynamic LAG using LACP? If dynamic, are your timers set for short or long (default)?
In the secondary member failure test case, are you actually seeing the primary actually continue forwarding? Reason I ask is whether you have split-detection enabled or disabled. If enabled (default), then in that situation specifically with two member stacks, the primary believes that it has failed for some reason and actually transitions to a dormant state. To avoid that, you need to disable split-detection.
In terms of the lack of protocol syncronization (aka Non Stop Forwarding/Bridging/Forwarding), I highly recommened you submit a feature request through the idea portal.
*** Edited - Should have said loop-protect, not loopguard ***
11-04-2013 01:50 AM
Sure, just showing total selfish interest. :smileywink:
Can you show the CAM table on one of these aruba switches?
If so can you post the switching table during normal operation, and then again during one of the stack member failures?
I guess maybe the switching table on the aruba may have stale information in it's table, and has not flushed it out when it's partner dies.
Also have you verified the nexus is forwarding frames across the vpc peer link to get to the last surving member of the stack?
Can you show LACP neighbor on the aruba too? It must be getting confused and the stack member time out must being aged out? Waht do you reckon? Maybe compare that with the cisco - which, I have to say - is going to be good right?
It's be great to see this - if you have time. thanks a million. I appreciate you are helping me so any thanks in advance.
11-04-2013 01:52 AM
when you say protocol sync - do you mean like LACP is unsupported (I know with cisco 3750s you used to only do cross stack etherchannel without procotol negotiaion - may have changed)
same kind of thing? Can you confirm?
11-04-2013 01:58 AM
on the cisco does the lacp neighbor show up as the same partner ID - or does it form like a secondary aggregator? the LAG id (just as already been mentioned needs to match - that is the cisco needs to see the same LAG id to form the port channel)
the LAG id needs to match from both aruba stack members - otherwise it will confuse the hell out of the cisco!!!
Maybe your LACP is failing and you are getting a spantree topology change? (as already said I think LOL)
below I have issued a VPC check for port channel 1001, from my 7k called "whatever" ( hostname names changed to protect me) - sorry if I am tellign you how to suck eggs :)
whatever# sh vpc consistency-parameters vpc 1001
Type 1 : vPC will be suspended in case of mismatch
Name Type Local Value Peer Value
------------- ---- ---------------------- -----------------------
STP Port Type 1 Default Default
STP Port Guard 1 Default Default
STP MST Simulate PVST 1 Default Default
lag-id 1 [(7f9b, [(7f9b,
0-23-4-ee-be-65, 83e9, 0-23-4-ee-be-65, 83e9,
0, 0), (7f9b, 0, 0), (7f9b,
0-23-4-ee-be-6f, 8fa1, 0-23-4-ee-be-6f, 8fa1,
0, 0)] 0, 0)]
mode 1 active active
Speed 1 10 Gb/s 10 Gb/s
Duplex 1 full full
Port Mode 1 trunk trunk
Native Vlan 1 257 257
MTU 1 1500 1500
vPC card type 1 Clipper Clipper
Allowed VLANs - 1,101-103,107,602 1,101-103,107,602
Local suspended VLANs - - -
11-04-2013 04:37 AM - edited 11-04-2013 04:37 AM