+10.000 AP in cluster design AOS 8.6

last person joined: 22 hours ago

Access network design for branch, remote, outdoor, and campus locations with HPE Aruba Networking access points and mobility controllers.

Back to discussions

Expand all | Collapse all

+10.000 AP in cluster design AOS 8.6

This thread has been viewed 2 times

1. +10.000 AP in cluster design AOS 8.6

0 Kudos
AP-NICZ
Posted Mar 25, 2020 11:54 AM

Reply Reply Privately
Hi

We are facing a design issue when having almost 10.000 Ap's and still growing.

At det moment we have set up Cluster 1 and running 3500 aps, and for our understanding talking with several aruba engineers its not possible for airwave to handle more then 4048 Ap's. This leaves us with 1 Cluster = 1 airwave.

So we would end up with 3 clusters for controlling up to 12.000 ap's

Cluster = 4 x 7240xm (2 in DC1 and 2 in DC2). One DC or 2 controllers should be able to handle all 4048 ap's in case 2 controllers goes down.

--Managed Networks
-----MCG
--------Cluster 1
--------Cluster 2
--------Cluster 3

The Main problem, and where the design issues lays, is how do we administrate how the AP's finds its cluster the best way.

As we see it, we have 3 choices:

1. We control the master ip/vrrp of the cluster on the DHCP scoopes in option 43. (ber in mind we have 900 locations) we would have to do a static plan, where we divide the locations in 3. this would't be a very dynamic solution.
pros.
- No ekstra configurations on the mobility master, and we are still able to inherit configurations from a top level and use i across all clusters.
- No cluster can't be overrunned with ap's per design, because of the division.
cons.
- The administrative headache in maintaining a list where the ap is connecting and the fact that we are letting the dhcp control where our ap should go.
- static configuration on each scopes and the need of a maintained plan for documentation to make sure the cluster is not taking in to many ap's.

2. We use option 43 from DHCP and configure cluster 1's vrrp, like we are doing today. When the AP is hitting cluster 1 we can control where it should go through LMS from the AP-group. This means we need to create and configure a specific ap group with an LMS ip, one for each of the 3 clusters under the ap system profile in the MCG folder.
pros.
- we control where the ap connect from the mobility master and in the controller environment.
cons.
- We have 25 ap groups and more to come, so this would mean we need 3 x 25 = 75 ap groups for all 3 clusters to handel all the solutions we have. And every time we make a change we should maintain 3 ap groups.
- Potentially cluster 1 could be overrunned if cluster 2 and 3 is unavailable and goes down, because of the the default behavior in discovering the master through DHCP. But then you probably have bigger problems on hand.

3. We use option 43 from DHCP and configure cluster 1's vrrp, like we are doing today. we then configure each ap with a static master.
pros.
- we control where the ap connect from the mobility master and in the controller environment.
- No ekstra configurations on the mobility master, and we are still able to inherit configurations from a top level and use the same ap groups across all clusters.
cons.
- Potentially cluster 1 could be overrunned if cluster 2 and 3 is unavailable and goes down.

No offens, but we are really hoping that someone in this community is in the same situation as we are. We are really struggling to find the right design, that would scale to more than 10.000 aps taking into account the limitation that airwave have.
2. RE: +10.000 AP in cluster design AOS 8.6

1 Kudos
EMPLOYEE

cjoseph
Posted Mar 25, 2020 12:02 PM

Reply Reply Privately
No matter what discovery method you use there will have to be a static plan. There is no mechanism to automatically move APs from an overloaded cluster to a lightly loaded cluster.
3. RE: +10.000 AP in cluster design AOS 8.6

0 Kudos
AP-NICZ
Posted Mar 26, 2020 03:41 AM

Reply Reply Privately
Agree, we don't expect loadbalance between clusters. We know that an AP will only live within the cluster it have been assigned. Our main issue here is how do we control this static assignment of where the ap should connect. It's frustrating that we are not able to have only one cluster with all ap's in it.

I'm curious how other companies handle this issue with the same scale.
4. RE: +10.000 AP in cluster design AOS 8.6

1 Kudos
EMPLOYEE

cjoseph
Posted Mar 26, 2020 04:37 AM

Reply Reply Privately
I will leave it to organizations that to reveal how they are designed for access points at that size. It is not my place to say what they have running.

I will say this based on the limited information you have given:

- With regards to discovery, access points only do dhcp, or dns-based recovery once in a clustered environment, so it is not necessarily something you have to constantly maintain. Once access points find their cluster, the list of ip addresses of all controllers in that cluster (the nodelist) is saved onto the AP's flash. If the AP reboots because of loss of power or reconfiguration, it does not do discovery again; it simply attempts to connect to all of the ip addresses in the nodelist. Discovery exists only for access points to initially find their controller when they are new: https://community.arubanetworks.com/t5/Wireless-Access/AP-termination-in-version-8-Clustering/m-p/428652#M81089
5. RE: +10.000 AP in cluster design AOS 8.6

0 Kudos
mineesen
Posted Mar 26, 2020 06:31 AM

Reply Reply Privately
Yes, you are fully right. We have encountered the same issues.

We have made the design in the following way (based on your second suggestion) and running 8.5.0.x, however this is the same on 8.6:

Two (HW) Mobility Masters in active-standby mode (please take care with VRRP settings, the HW is a little bit slow and VRRP timers starting to run before the physical ports are available).

Below the MM there are 3 clusters:
Cluster 1 with 4 controllers, 2 in each datacenter seperated by cluster groups.
Cluster 2 and 3 with 2 controllers each, one in each datacenter.
Each Cluster is sharing one VRRP IP pointing to the Cluster leader, but which one is serving as VRRP anchor is not relevant. Termination IP is the VRRP IP of Cluster 1.

We configure AP groups and LMS IPs on the Managed Devices level to ensure that all Clusters have the same information.

After first boot, the AP is connecting to Cluster 1 where it will receive the information to connect to cluster 1, 2 or 3.

Cluster 1 is assigned to one Airwave instance, cluster 2 and 3 to another. Keep in mind to include the MM in all Airwave and to ignore the controllers that should not be visible if you need UCC information.
By this method you can limit the number of access points to the suggested limits.

At the moment we calculate with 80% of the datasheet limits on all platforms (recommendation of Aruba).
Meaning: 3300 APs per Airwave and per 4 Controllers in full redundancy -> 80% of 2048 for the 7280s ~ 1650 APs.

Our platform limit is currently the MM where we have 80% capacity planned -> 8k APs. This will be the bottleneck and everyone is hoping that AirWave 10 on-premise will be there before we hit this limit.

Hope this helps.
6. RE: +10.000 AP in cluster design AOS 8.6

0 Kudos
AP-NICZ
Posted Apr 02, 2020 10:53 AM

Reply Reply Privately
Thank you for you reply, nice to hear from someone who actually have the same size environment as we do.

What exactly do you mean when you say :"Keep in mind to include the MM in all Airwave"

Before we go ahead and choose option 2, which is what we are leaning to right now, we still want to explore the possibility for option 3 which is still a better choice regarding the many ap groups you would otherwise have.

What is the drawback of option 3, that would make us want to go with option 2?

We have just tried to move an ap from cluster 1 to cluster 2, by statically assign the vip adresse for cluster 2 under provision. The AP rebooted and for a short while it connected to the controller on cluster 2, and then went back to cluster 1. When checked the ap configuration under provision, it still have the vip address statically assigned to cluster 2. We have tried to reboot the ap several times but it keeps connecting to cluster 1.

As i understand it, the priority for the aps discovery is
1. static discovery
2. dhcp
3. dns
4. adp

Referring to this:
https://www.arubanetworks.com/techdocs/ArubaOS_82_Web_Help/Content/ArubaFrameStyles/AP_Config/AP_Discovery_Logic.htm

Another question, and what we would like to test, isz what happens when cluster 2 i unreacheable. What is the behavior of the ap, when it hits cluster 1 because vip adresse of cluster 2 is down, does it reboot again to try an reach its static assign vip, and does this create a loop.
7. RE: +10.000 AP in cluster design AOS 8.6

0 Kudos
EMPLOYEE

cjoseph
Posted Apr 02, 2020 01:16 PM

Reply Reply Privately
I would just say that if you haven't, ask your Aruba Sales Engineer to possibly put you in touch with other customers who manage thousands of APS. Also consider putting a Value Added Reseller who has experience deploying this so that a single entity can be responsible for your results. I am not sure that advice from people who manage more than 10,000 is required; you just need to talk to an admin who can answer questions about redundancy for thousands of access points. On this forum, you will get ideas from many people, but the job of the Value Added Reseller would be to (1) Come up with an initial design (2) Answer all of your technical questions (3) deconflict any design ideas with your specific deployment to ensure that all ideas will work with your current infrastructure and be manageable at the end of the day.

With that being said, your posted questions have very straightforward answers:

"What exactly do you mean when you say :"Keep in mind to include the MM in all Airwave""

- If you were forced to have more than one MM, you could possibly add the ip address of the MM of each deployment to a single Airwave instance to get all of the centralized UCC data for your whole deployment without having to manage all of the MDs and access points. There are positives and negatives to this, so you should consult your Var to determine if your deployment would be a good candidate for this

- "We have just tried to move an ap from cluster 1 to cluster 2, by statically assign the vip adresse for cluster 2 under provision. The AP rebooted and for a short while it connected to the controller on cluster 2, and then went back to cluster 1. When checked the ap configuration under provision, it still have the vip address statically assigned to cluster 2. We have tried to reboot the ap several times but it keeps connecting to cluster 1. " - I would use the apmove command to move access points from one cluster to another: https://www.arubanetworks.com/techdocs/ArubaOS_86_Web_Help/Content/arubaos-solutions/cluster/clus-over.htm?Highlight=apmove

- "As i understand it, the priority for the aps discovery is

1. static discovery

2. dhcp

3. dns

4. adp

Referring to this:

https://www.arubanetworks.com/techdocs/ArubaOS_82_Web_Help/Content/ArubaFrameStyles/AP_Config/AP_Discovery_Logic.htm -

- Please look at the page here: https://www.arubanetworks.com/techdocs/ArubaOS_86_Web_Help/Content/arubaos-solutions/access-points/enab-ctrl-disc.htm. There are many ways to configure discovery, and only DNS is really recommended in large-scale networks. Why? Because you can return multiple ip addresses back to APS for initial discovery without running a VRRP between your controllers (you want to resist running broadcast protocols in a larger network, anyways). The AP will look for aruba-master.<domain received from dhcp>. You can populate the aruba-master a-record with multiple ip addresses and if your DNS server is round-robin, it will provide a different ip address for each AP that resolves aruba-master. If your DNS server is NOT round-robin, it will supply all of the ip addresses to the AP and the AP will try them one by one. You conceivably could put all of the ip addresses of the controllers in the cluster in the aruba-master a-record and your access points will look for all of them until it finds one.

With regards to redundancy and clustering, aruba controllers do not fail often. When they do, you want to at minimum provide an n+1 - type cluster (with all controllers at the same physical location for high speed redundancy), so that (1) your clients would not see any interruption and (2) you would be able to service the failed controller when you have chance without interruption. If you wanted to have redundancy for this setup, you could always provide a "backup lms-ip" pointing to the VRRP of a secondary cluster. The APs in the primary cluster would have to exhaust connection attempts to all of the cluster members before resorting to connect to the secondary cluster (no reboot). If your secondary cluster is at a location where bandwidth between all of the APs and that backup clusters is poor, your user experience will also be poor. That is why convention wisdom would say to add extra controller capacity to the primary cluster to provide redundancy locally in the unlikely event that more than one controller fails at a time.

Again, there are many way to design such a deployment to meet your redundancy needs, but someone (maybe a VAR) needs to be in charge of such a project so that it also meets all of your operational needs so that you will not design yourself into a corner..
8. RE: +10.000 AP in cluster design AOS 8.6

1 Kudos
mineesen
Posted Apr 03, 2020 03:00 AM

Reply Reply Privately
Please be vary careful with the design recommendations of Aruba and double check feasibility. Sorry to say that, but the design that was proposed by Aruba had some issue when it comes to customer critical infrastructure in production. We had to adapt our design because some of the functionalities did not work as it was suggested. So please do the discussion with VAD and Aruba but please test it as well.

DNS vs. DHCP
We use DNS, because the number of subnets where just too large and we had in parallel Cisco and Aruba wireless. This would have meant additional effort to exclude the Cisco option 43 from the Arubas.

We have for each AP group the LMS IP set to the "right" cluster. This works good as the AP will use the LMS as final choice for cluster assignment. If you have a good IP design it might be possible to assign a cluster via DHCP in the first step. This has the advantage to minimize the boots of the AP.

AP move is not a really good idea from my point of view. When you are in a rollout phase, you probably dont want to deal with some CLI options to move AP from one point to another. Therefore, the LMS option will be your choice.

Coming back to the Airwave and MM:
We have one MM (pair) that is dealing with three clusters. One Airwave is responsible for Cluster 1 and the other is responsible for Cluster 2 and Cluster 3.
If you want to have the UCC information for Cluster 2 and 3 you need to include the MM in the Airwave. If you leave the default settings, the MM will include ALL APs and MDs in both Airwaves. This will not work as you have too many APs for the Airwave instance.
Therefore, you need to include MM but prevent MM to automatically add all MDs. Just ignore those you dont want in this instance and approve the others. Then you have UCC for the APs and MDs that shall be managed by this Airwave instance.
By the way: We discussed the design with Aruba and no-one was aware of this issue or pointed that out. We had to find and adapt it in production...

Redundancy
Sorry to be very clear on this: The previous answer is also far from reality. The n+1 option is not option when you want to have datacenter georedundancy (most customers in this dimension want to have this).
n+1 is just an option if you dont want to have georedundancy as this will fail when more than one controller (e.g. two are in the same datacenter and datacenter fails) is going down.
If you are fine with a reboot, you can use n+1 in addition with another cluster in second datacenter. But in this case you need one controller more than in the design using groups.
9. RE: +10.000 AP in cluster design AOS 8.6

0 Kudos
EMPLOYEE

cjoseph
Posted Apr 03, 2020 04:27 AM

Reply Reply Privately
"Please be vary careful with the design recommendations of Aruba. Sorry to say that, but the design that was proposed by Aruba had some issue when it comes to customer critical infrastructure in production. We had to adapt our design because some of the functionalities did not work as it was suggested. So please do the discussion with VAD and Aruba but please test it as well." ----

This should be restated as - "Please be vary careful with the design recommendations of Anyone." Just because something works, doesn't mean it works for the OP or anyone else.
10. RE: +10.000 AP in cluster design AOS 8.6

0 Kudos
mineesen
Posted Apr 03, 2020 04:34 AM

Reply Reply Privately
Thats correct as well.

Wireless Access

+10.000 AP in cluster design AOS 8.6

1. +10.000 AP in cluster design AOS 8.6

2. RE: +10.000 AP in cluster design AOS 8.6

3. RE: +10.000 AP in cluster design AOS 8.6

4. RE: +10.000 AP in cluster design AOS 8.6

5. RE: +10.000 AP in cluster design AOS 8.6

6. RE: +10.000 AP in cluster design AOS 8.6

7. RE: +10.000 AP in cluster design AOS 8.6

8. RE: +10.000 AP in cluster design AOS 8.6

9. RE: +10.000 AP in cluster design AOS 8.6

10. RE: +10.000 AP in cluster design AOS 8.6