@ElisUKIT wrote:
For the failover between the two controllers was done by unplugging the LMS from the network, which failed all APs to the BackupLMS without any issue. This is not what I am questioning. GOOD
There is no direct link between the LMS and Backup LMS, as we are moving to Layer 3 across 2 sites GOOD
When the AP was connected to the BackupLMS as were another 184 across the country, it then had its Ethernet cable pulled and put back in. This is when it hung. The 184 had no issues, as they did not need to discover the controller.
"it then had its Ethernet cable pulled and put back in. This is when it hung" - Are you saying that the AP had its ethernet cable pulled and plugged back in? If that is the case, the AP completely power cycles and starts the master disvoery (aruba-master) all over again. It does not move to the second controller, because the lms-ip and backup lms is not saved across reboots. If you want cold-boot discovery, you should put two ip addresses into the a-record for aruba-master, so it can discover both ip addresses. Here is how that would work:
AP boots up cold, resolves aruba-master.domain.com and receives two ip addresses, OR if your DNS is configured to do round-robin, it sends one ip address upon first resolution and then a different ip address on a second resolution (turning off round robin on your DNS server would offer the best performance). If the AP receives two ip addresses, it will attempt to reach the first ip address and then attempt to reach the second if the first controller doesn't respond. If it reaches the backup controller, it will receive its lms-ip and backup lms-ip. It will then attempt to reach the first controller that is down, and if it doesn't answer (because it is down), it will then attempt to reach the second controller.
Big picture, failing over access points to a second datacenter is typically a last resort. Controllers don't fail often, so you should just put two at the same site and point aruba-master to the vrrp between them. If you have network problems at a site, frequently that same problem will prevent access points from reaching the controller at the backup site. If the access points do reach the controller at the backup site, all cllients will have to receive different ip addresses, which will disconnect their applications. In addition, if connectivity to the remote site is not good, your clients, in addition to having to reconnect their applications, will have poor performance. Having a second controller in a VRRP configuration to backup the first controller at the primary site provides the best failover performance (no application restarts) and performance will continue to be like it was in the first place (sometimes your users won't even notice). It also offers you the opportunity to swap out a controller at the primary site during production if there is a hardware failure, etc without disturbing your users. If you need to provide a backup controller at a second site, you can do that as well.