Happy that it helped. Don't hesitate to report your failover test results (like failover time). Numbers might be useful for the community.
Thanks.
Original Message:
Sent: Sep 28, 2023 10:14 AM
From: JH37
Subject: EVPN/VXLAN Failover
Ah, I didn't get that from the documentation. But I've just tested it out and it works like a charm. Thanks Vincent!
------------------------------
Jelmer Hartman
Original Message:
Sent: Sep 27, 2023 08:43 AM
From: vincent.giles
Subject: EVPN/VXLAN Failover
For fall-over, recursive lookup of next-hop should not happen on "not exact prefix match" (i.e. /32 of the nexthop).
Original Message:
Sent: Sep 26, 2023 10:29 AM
From: JH37
Subject: EVPN/VXLAN Failover
I do have a default route in my underlay. Will this still work then? I figured that the default route will prevent the system from detecting the unreachability of the VTEP. Is this correct?
------------------------------
Jelmer Hartman
Original Message:
Sent: Sep 26, 2023 02:24 AM
From: vincent.giles
Subject: EVPN/VXLAN Failover
I would use neighbor a.b.c.d fall-over (without bfd option).
As mentioned in the VXLAN user-guide,
Using this command, when your 6300 reboot, the underlay uplink to spine should get down, the rebooting VTEP loopback should be withdraw once OSPF converged ((less than 1~2s max) and then the corresponding EVPN type-2 route with rebooting VTEP next-hop should disappear.
Original Message:
Sent: Sep 25, 2023 10:49 AM
From: JH37
Subject: EVPN/VXLAN Failover
Hi,
We have a EVPN VXLAN Setup using Spine/Leaf. Most of the Leafs until now have been using VSX. They perform really well, but for a new Rack we wanted something cheaper, and opted for the 6300 (24xSFP+). The connected servers are all vmware, and they do not use LACP LAGs. They just move over the MAC to the other NIC when the link goes down. The gateway is an active gateway.
I have opted to configure the two 6300 switches independently for the following reasons:
- We do not need multichassis lags
- Replacing a standalone switch is much easier then replacing a VSF Member (especially in Central)
- Upgrading VSF reboot's both members at the same time, and causes downtime. (I have seen some roadmap changing this in a future version, but upgrading both independently still seems easier)
The setup with two independent switches works well in normal circumstances. If I disconnect Cables from a server in 1 switch, all Mac addresses move instantly to the other switch and everything stays reachable. So far so good.
However when I reboot a switch the failover does not work. I see the MAC addresses of the server, and also the ARP entry pointing to the second switch, but still I can't ping servers that used to reside on Switch 1.
If I look at the bgp paths, I see that MAC and IP entries are present for both switches. So maybe the traffic is still sent to the old switch that is down. I can think of a couple of solutions but I'd like to hear some thoughts on this.
- Make them into a VSF stack and accept the hassle that comes with that.
- Use neighbor x.x.x.x fall-over bfd. Without bfd it would not work because the switches have a default route in the underlay. (Maybe I could get rid of those though)
What would be the cleanest solution for this according to you guys? Or is there another approach that I could take here?
------------------------------
Jelmer Hartman
------------------------------