Working with TAC but looking for anyone else who may experienced what I am seeing.
We currently 1 3400 (lets call it M3-RAP1 serving about 150 RAPS (50/50 mix of RAP 2s and RAP 3s) as a primary LMS and our master controller (which also serves our wireless environment, call it N8-CON1). We ordered 2 new 3400s (call them M4-RAP1 and M4-RAP2) to set up as a cluster and do carrier redundancy through BGP, eventually we will take the existing 3400 (M3-RAP1) and put it in DR.
I had to get the configuration from the existing cluster over to the new cluster (M4-RAP1 and M4-RAP2) so I added them as local controllers to N8-CON1. I then detached them and created the new cluster. I created a new profile with the VRRP address of the new cluster as the primary LMS and the IP, which is assigned right on the physical interface, M3-RAP1 as the backup LMS. For licensing reasons I upgraded the new cluster to 18.104.22.168 whereas the old cluster is on 22.214.171.124. I didn’t want to install 2 sets of 150 licenses on each controller when we have plans to migrate to 6.3 in the near future anyway.
With all basic connectivity in place and the configurations matched up (including the whitelists) I changed the profile of a RAP from the CLI. The change took, the RAP connected to the new controller and started upgrading, then rebooted the went down. I have also seen this behavior on the old cluster. It seems to bounce back and forth. The RAP shows up in the AP database on the new cluster so I know it is connecting, it just wont come up on the new cluster.
If I hard reset the RAP and enter the LMS of the new cluster it connects no issue. I started looking at the configuration of the profile and child objects and noticed a few small inconsistencies and fixed them. To fully test this I created a brand new profile and only changed the LMS and backup LMS IPs. I also tried removing the backup LMS to make sure it just connected to new cluster. An interesting behavior I noticed here, even though the backup LMS was not defined in the system profile, the RAP connected back to the original LMS. It continues to go through the upgrade, reboot down cycle but it still found it’s way back. I tried changing the LMS IP (without backup) to the IP on M4-RAP1 to eliminate the VRRP as an issue, no dice. I also changed the LMS IP to the public IP of each respective controller. For example, on M4-RAP1 the LMS is 126.96.36.199 for the TEST AP group and it is 188.8.131.52. for M3-RAP1. Since the clusters are separate there is no sync, I have been careful to perfectly replicate any changes on either cluster to the other.
I can easily (from an engineers perspective) fix this problem by resetting the RAP. The problem is that they are deployed to non-technical individuals at their homes and our track record is not great right now so we need a zero touch solution.
Thanks in advance!
Since you already have TAC involved, they probably have a better handle on this.
Is the VRRP outside or inside the firewall?
Thanks for the reponse. Its not for lack of trying but TAC hasn't been able to get far with me on this one.
The VRRP is behind a Cisco router with an open ACL (for testing) so I dont suspect that to be an issue.
Is there a firewall in front of that VRRP?
What do the access points use to find the initial controller? DNS or is it hardcoded?
Have you tried terminating the APs on one or both of the actual public ip addresses of the controller? If if it works, you should use a DNS a-record with two ip addresses for redundancy, instead.
There are certain scenarios where terminating RAPs on VRRPs does not work.
You should NOT have an LMS-IP in the AP system profile. Remove it, because it is not needed. APs will find their way with whatever ip address you provision them with. If you need redundancy, you should provision APs with a DNS a-record that has 2 ip address entries: One for the primary and another the the backup controller.
You can do this with an AP provisioning profile. Read this: http://www.arubanetworks.com/techdocs/ArubaOS_64_Web_Help/Web_Help_Index.htm#ArubaFrameStyles/AP_Config/AP_provprofile.htm and then ask questions about how it would be appropriate in your environment.
After reviewing the article I don't see how the AP Provisioning Profile helps in my scenario. The info is valuable and appreciated but I dont feel it applies to my deployment.
Taking my specifics out of the equation, let me ask this: what is the best way to move a RAP that is up on cluster A to cluster B without resetting it?
My understanding that a a new profile with the new LMS is the AP System Profile was the way to do this.
When you provision a RAP, you need to put in the public ip address of the controller so that when the AP is booted, it knows where to go. That ip address is saved in flash on the AP and survives as reboot. In fact, it is the ip address that the AP searches for upon cold boot.
The AP provisioning profile allows you to replace that ip address with the ip address in the provisioning profile, so that you do not have to do it. The AP will reboot, and then try to connect to the ip address in the provisioning profile. The provisioning profile is the quickest way to move RAPs from one controller to another, permanently.
What does not apply to your situation?
Thanks for your help through this issue. TAC was able to resolve the issue. I was trying to change the AP System Profile and change the AP Group of the RAPs. TAC advised just changing the Master Controller IP Address/DNS name under Master Discovery in the AP Installation > Provisioning tab.
At Aruba, we believe that the most dynamic customer experiences happen at the Edge. Our mission is to deliver innovative solutions that harness data at the Edge to drive powerful business outcomes.
© Copyright 2020 Hewlett Packard Enterprise Development LPAll Rights Reserved.