Hi All,
I have been testing the following scenario.
Primary site has a master+standby pair. The backup site has 2xlocals (with a local vrrp). There is a routed WAN between sites. All controllers are communicating between each other fine. They are running 6.3.1.2. We can't upgrade, as all later versions have bugs that would affect the setup (same if we downgrade). Control plane is on, and both the RAP and CAP whitelists appear sync'd correctly on all controllers.
CAP failover tests work just fine. RAP tests seem to struggle.
RAPs come in via two public IPs translated by perimeter firewalls to the controller private addresses. Both ways in work in isolation. The RAP system profiles are set with appropriate LMS and backup-LMS addresses.
If you failover a RAP from a local to master, it works fine. If you failover a RAP from the master to the local, the RAP never makes it into the AP table in an "up" state. It does however, show in the local controllers datapath session table. This test was performed by shutting the VRRP on the standby+master in that order. The locals are pointing at that VRRP as the "master".
I suspect something to do with the RAP whitelist function, as CAPs do failover ok. Interestingly, as soon as you enable the master VRRP again, even though the RAP is still targeting the local, it gets in ok (to the local AP table "up"). This makes me think perhaps the local is trying to check the RAP whitelist on the master during the failover? I'm not aware of any configurations you can do against this. I was of the mind it should just work? I.e. the local should look at it's own table if it can't check against the master. Am I wrong about this?
If it is supposed to work as I understand, can anybody suggest some relevant debugging logging levels or troubleshooting commands that might help please?
Thanks!