The problem is to do with the limitations of Aruba’s different HA modes depending on the controller role. The site has two controllers configured as Master and Backup Master. In a large Aruba deployment there would also be a number of Local controllers each terminating a number of APs but in smaller deployments this is not required and so the Master and Backup Master also serve as Locals.
Here’s the problem… In a Master/Backup Master topology, only the Master can function as an Active Local (terminating APs) and the Backup Master can function as a Standby Local in an AP Fast Failover HA Group, even though both members of the HA group are configured as “Dual”.
This could be overcome by switching to a Master/Local redundancy topology but that would leave the Master role vulnerable to single controller failure so for small HA deployments (i.e. only two controllers) Master/Backup Master is recommended.
What is not clear in Aruba’s documentation is how this affects AP Fast Failover HA, and more to the point.. how to configure it correctly so it works!
The AP System Profile has entries for LMS (Local Controller) and Backup LMS. Backup LMS refers to an older legacy form of Local redundancy which has nothing to do with AP Fast Failover. For AP Fast Failover to work does not require Backup LMS to be set because the standby Local controller comes from the HA Group configuration and not from the Backup LMS setting. Backup LMS can still be set but here we run into conflicts. Remember that we have a Master/Backup Master topology in which the Backup Master cannot terminate active AP sessions.
So what happens if Backup LMS is set and the AP loses connectivity?
It takes this as a failure of the (Primary) LMS and attempts to reconnect an active session to the Backup LMS and can’t succeed because the Backup LMS only accepts standby sessions. The AP never tries to go back to the (Primary LMS) and never reconnects.
The correct configuration in this topology is to LEAVE BLANK the Backup LMS setting.
Now in normal boot, the AP connects an active session to the (Primary) LMS and a standby session to the other LMS (as defined in the HA Group). A failure of the (Primary) LMS will initiate failover to the Standby. A loss of communication to BOTH controllers, e.g. switch loses uplink to core will not cause the AP to attempt active connection to an invalid controller but will simply reconnect to the configured LMS when connectivity is restored.