Wireless Access

 View Only
last person joined: 21 hours ago 

Access network design for branch, remote, outdoor, and campus locations with HPE Aruba Networking access points and mobility controllers.

Aruba controllers to Arista routers

This thread has been viewed 8 times
  • 1.  Aruba controllers to Arista routers

    Posted Oct 04, 2021 08:25 AM
    We have changed our WiFi environment from Juniper routers to Arista routers.  We have had our WiFi on the Juniper routers for 6 years. We run 8OS. We have 8 controllers 4 in one building and 4 in another building. The night we made our change to Arista routers the building 1 side worked great, the other building 2 had issues which caused the entire network to see slowness and random drops with non connectivity.  If we shut off building 2 the network was stable, when introducing building 2 the network would break.  We had Aruba engineer (Subramanian, Vasudeva)  and Arista engineers working seamless together in resolving this issue. We appreciate all the hard work both parties gave us.
    Below are the notes from Aruba and Arista on what the issue was: 

    Arista notes:
    Flow for forward direction : Client (AP) [GRE encap] --> Alden/CSC leafs (building aggregation pod) --> core --> Alden/CSC wireless (7280SRs) --> Aruba Alden/CSC controllers [GRE decap] --> Alden/CSC wireless --> core (internet)

    Problem Description : Shortly after replacing Junipers as connection switches to the Aruba WAP Controller Clusters, large ping loss and wireless performance issues were noted. This broken state started around 3:00 PM EST on 9/26. Performance issues and packet loss ended when the controllers were no longer acting as a cluster.

    Troubleshooting performed today : 

    --> We took a couple of packet captures to understand what kind of traffic was hitting the controller that was causing their CPU utilization to spike - A TX capture on Po101 of the CSC-wireless router (connected to the .201 controller), capture on SVI 2700/2710 of both CSC/Alden wireless switches. We filtered out ARP, IGMP, DHCP and MDNS - We did not find any significantly heavy broadcast/multicast flows.

    --> Aruba enabled an optimization on the controllers that would discard broadcasts/multicasts besides ARP, DHCP and MDNS. This indeed brought down the CPU utilization on the controllers. This optimization was enabled for client VLANs 2700/2710, as traffic on both these VLANs seemed to be causing the high CPU utilization on the controller.

    --> Aruba identified that some client flows that were to be seen only on the Alden controllers of the cluster were visible on the CSC controllers. 

    --> We suspected that the spine devices acting as the core in the fabric were doing ECMP. Suppose a client is connected behind a controller in Alden, forward-traffic for that user would be via a GRE tunnel that terminates at the controller. The Alden controller decapsulates traffic, bridges it to the Alden-wireless router which then routes it to the core. Now, for reverse traffic, the core would have the choice of sending packets to either Alden-wireless or CSC-wireless. If the packets hit CSC-wireless, then in that case CSC-wireless will not have the MAC learnt for (when ARP/MAC ageing timers are in the default state) that client. CSC-wireless will flood the return flow (as unknown unicast) - One copy reaches the Alden controller which would then GRE encapsulate it and send it off on its routine path. But each CSC-controller (that is directly connected to the CSC wireless router) would also receive these packets. 

    --> We tested this by modifying ARP/MAC aging timers(MAC>ARP) for the two main client VLANs (2700 and 2710) on the CSC-wireless router. We immediately saw datapath utilization go down on the CSC-controllers.

    Changes made (On VLAN 2700/2710 of both routers) via CVP CC - 
    ARP timer - 600 seconds
    MAC timer - 900 seconds

    We would recommend reviewing these configurations once with our Account Team (SE Mark Katterheinrich CC'd to this email). The CSC/Alden wireless routers are currently not in MLAG - This is a possibility that can be considered, as having the devices in MLAG will enable MAC sync. Any MAC addresses learnt by an MLAG device would be synced with its peer, and the corresponding ARP entries would not go to 'not learned' state when the MAC is present on the wireless-router of the other site.

    Aruba notes: 

    - The setup has 4 controllers in cluster1. 2 from Alden building and 2 from CSC building.

    - The wired environment was migrated to Arista recently.

    - When CSC controllers are brought into cluster we saw high datapath utilization with zero clients connected.

    - The datapath utilization goes high with various CPU from time to time with 22,23 and 24.

    - BWM table get policing on excess ARP traffic for contract ID:9.

    - We enabled BCMC optimization on Vlan 2700 and 2710 and found the utilization came to normal.

    - We confirmed that there is an excess traffic hitting the CSC controller from Vlan 2700 and 2710.

    -  We also noticed the client traffic which is destined to Alden UAC controller is seen on CSC controller[ which is the AP anchor].

    - Involved Arista TAC and twe found that due to equal cost routing the traffic may land on these controllers.

    - Additionally, the uplink switches are designed to do ARP refresh frequently to keep the mac and ARP table intact. This may result in flooding of frames frequently.

    - They reduced the ARP timer on the switch to 10 mins and increased the mac timer to 15 mins.

    - This reduced the excess flooding and controllers become stable on utilization.

    - We implemented the changes on all Vlans. Added CSC controller 2 to cluster and refreshed cluster.

    - APs and clients are equally load balanced now.



    ------------------------------
    Bill Harris
    ------------------------------