Wired Intelligent Edge

 View Only
last person joined: 2 days ago 

Bring performance and reliability to your network with the HPE Aruba Networking Core, Aggregation, and Access layer switches. Discuss the latest features and functionality of your switching devices, and find ways to improve security across your network to bring together a mobile-first solution
Expand all | Collapse all

6300M (JL762A) arp-suppression broken

This thread has been viewed 9 times
  • 1.  6300M (JL762A) arp-suppression broken

    Posted 11 days ago

    Hi,

    moving HPE Alletra storage network ports to a pair of new 6300M leaves failed unexpectedly. The storage systems do a pre-check based on arping, and it didn't get an answer for a target IP connected to the very same switch on another port, both ports being simple access ports in the same VLAN.

    Environment:

    • 10.13.1000
    • EVPN/VXLAN Spine/Leaf with IPv4 OSPF/iBGP underlay
    • VLAN is member of the L2 Overlay (has a VNI)
    • VLAN is member of the L3 Overlay (has an SVI in the L3 Overlay VRF, uses redistribute host-route)
    • L3 Overlay is Symmetric IRB and using Distributed IP Gateways (SVI on each (relevant) leaf, has an AG MAC/IP)

    So far that's all straight to the book. Diverging from defaults, the arp-suppression (and nd-suppression, but ignore that as no IPv6 is in use) features are enabled in evpn context.

    Observations:

    • As long as the target IP is not learned (not visible in show evpn mac-ip), ARP who-is broadcasts are flooded to the other port in the VLAN as expected, the host there answers and the is-at reply is unicast back to the querier. It makes no difference whether the source IP in the ARP who-is (the "tell" address in tcpdump) is a valid IP for the VLAN/SVI in question, if it is 0.0.0.0 or some arbitrary APIPA.
    • As soon as the target IP is learned, though, the ARP who-is broadcasts are filtered. Not only from leaving the switch towards other VTEPs (the behavior that's expected and documented), but also from flooding to the other local port in the same VLAN. It makes a bit of sense when the switch is doing that because it intents to proxy-ARP answer the request anyway, so as to prevent duplicate answers. But it's still a bit cringe, and for it to work, the switch would then have to answer the request.
    • Happens the JL762A doesn't answer requests it just filtered under certain conditions:
      • ARP with source IP 0.0.0.0 is never answered
      • ARP with a legitimate source IP is only answered when that source IP is also learned (present in the control plane)
    • Cross-testing this with a 8360-32Y4C shows these issues are specific to the 6300M: The 8360 does the exact same filtering, but it also answers in any case the 6300M failed to answer.
    • Even on the 6300M, this only applies to locally-switched cases. If the target IP is cross-fabric (on another VTEP), we get answers.
    • Disabling arp-suppression (as expected at this point) ends the whole malaise. Every broadcast floods (including locally) and every way of querying gets the answer directly from the target.
    • Disabling L3 in the VLAN (shutting the SVI) also leads to always-working ARP, which is due to MAC-IP mappings only being learned when L3 is active (or in other words, no ARP suppression taking place in a pure L2 overlay).

    Edit: Forgot to say that replicating the issue doesn't need any fancy additional chassis (spines or other leaves), it happens on a single 6300M as long as the relevant configuration is present.

    Anyone seen this, or something similar? Reports for other switch models (is the whole 6300M line affected, or just a part of it including the JL762A)? Is anybody actually using arp-suppression or just following defaults? Or is everybody disabling it quickly after tripping over the potential problems (there are more, but the above is a killer bug, not just something you would expect due to the nature of proxy ARP with potentially stale mappings)?

    Regards & TIA,
    Andre.



  • 2.  RE: 6300M (JL762A) arp-suppression broken

    EMPLOYEE
    Posted 11 days ago

    When reading this, it looks quite specific, you did quite some documented research already, and there may not be too many people running into exact this issue and noticing it.

    My guess would be that opening a TAC case is the best way to address this.



    ------------------------------
    Herman Robers
    ------------------------
    If you have urgent issues, always contact your Aruba partner, distributor, or Aruba TAC Support. Check https://www.arubanetworks.com/support-services/contact-support/ for how to contact Aruba TAC. Any opinions expressed here are solely my own and not necessarily that of Hewlett Packard Enterprise or Aruba Networks.

    In case your problem is solved, please invest the time to post a follow-up with the information on how you solved it. Others can benefit from that.
    ------------------------------