I'm not sure either .. it was some kind of (not really) educated guess. I did configure this "ha group-profile" at the /mm node-hierarchy level of the primary controller:
ha group-profile LAB-HA
preemption
state-sync
pre-shared-key <psk>
controller <mc1-ip> role dual
controller <mc2-ip> role dual
exit
ap system-profile default
lms-ip <mc1-ip>
bkup-lms-ip <mc2-ip>
exit
ha group-membership LAB-HA
Then I sync'd the config to the secondary controller ("database-synchronize"). This seems to be the only way to configure HA in a master/backup deployment.
I also tried to omit the "state-sync" statement, but in either case, some clients are running into the "mic failed" issue when the primary controller is offline. Others do not run into this and are working quite well.
I learned that the only way to make the controller failover work for all clients is to purge that "ha group-profile" and to point the APs to the VRRP VIP instead.
I did some more tests with 802.1x and Guest (captive portal) SSIDs. Failover times are even worse with those SSIDs. It takes at least 2 minutes until the clients can re-connect, though the APs are NOT bootstrapping. Of course, this is still better than the issue where they can't connect to the secondary controller at all.
I did my recent tests with v8.5.0.0 - there might be room for improvements in future software versions :)
-Andreas