Wired Intelligent Edge

 View Only
Expand all | Collapse all

AOS-CX breaking changes between 10.11 and 10.13

This thread has been viewed 2 times
  • 1.  AOS-CX breaking changes between 10.11 and 10.13

    Posted Sep 27, 2024 08:28 AM

    I have two VSX pairs of 8325 switches working in two datacenters on OS version 10.11.0001.  There is BGP EVPN runnig on them, several VLAN stretched between DCs, some servers (including ESXi hosts)  and bunch of external connections to WAN routers. Recently I've tried to upgrade to 10.13.1040 and failed in some interesting ways.  

    After the upgrade some random things lose communication. It seems that all ARPs, MACs and required routes are present both in the l2tp evpn address family in the underlay as well as in overlay ipv4 but no communication between some random parts of the network.  In one case I could not even ping switch SVI from a VM despite  MAC and ARP present on the switch. Rebooting the switches back to 10.11 restores everything.  

    Could you suggest some troubleshooting steps and ideas what to try to fix the config for 10.13.  I have only couple of hours late in the night every week to two  to try out something  

    Below a simplified diagram with switch connections and most of cases of external connections. 

    simplified diagram


    ------------------------------
    -- tommyd
    ------------------------------


  • 2.  RE: AOS-CX breaking changes between 10.11 and 10.13

    Posted Sep 27, 2024 10:13 AM

    How was the upgrade procedure done?
    Did you make use of "vsx upgrade-software" command? Did upgrades of VSX clusters (or single switches) in LAB1 and LAB2 overlap in time?

     

    Reason asking:

    • Checking 10.13.1050 release notes, it says (under 'resolved issues'):
      • Symptom: Traffic loss is observed in VXLAN tunnels.
      • Scenario: This issue is observed when the VSX partner and its directly connected VXLAN VTEP peer are rebooted simultaneously.

    Without the info asked for given, looking at your drawing, this might be a possible scenario ...
    In that case, I'd recommend to re-run the upgrade (possibly to 10.13.1050) by means of "vsx upgrade-software", one VSX cluster at a time.

    (May be worth checking VSX config beforehand: "vsx-config mate", "show vsx brief", "show vsx status" ...)

     

     

    Just in case, be prepared to capture the support files during error condition on all four 8325 switches.

    With that, you have got all the data available, TAC might ask for, when you open a ticket ...
    And given the complexity (yes, EVPN-VXLAN with VSX is a complex setup, even with four switches only), I'd suggest to contact TAC if the problem should persist.