Makes sense, we have now disabled preemption. We will probably follow the step you outlined (reload the B MCR) so that A takes back over simply so that we don't mistakenly add licenses to the B MCR. Thanks for your help.
Original Message:
Sent: Jan 28, 2025 11:19 AM
From: chulcher
Subject: Changes overwritten when bringing primary MCR back online
My recommendation: ignore which MCR is currently active during normal production and don't mess with preemption or tracking as the appliances are functionally the same under normal operation. When you upgrade AOS, install software and reboot in an order that gets the appliance you want to be the master copy, otherwise don't worry about the status. Always use the VRRP VIP as your target for HTTPS/SSH so that you are always on the active appliance.
The only time you should really care about which appliance has control is when adding licenses, or if the license holder goes offline for more than 30 days. At that point you'll need to worry about redeploying licenses so that they are associated with the MCR still running.
------------------------------
Carson Hulcher, ACEX#110
Original Message:
Sent: Jan 28, 2025 11:08 AM
From: cauliflower
Subject: Changes overwritten when bringing primary MCR back online
We'll turn it off for now. Can I trouble you with one more question? - I notice there are some options for tracking VRRP like 'Tracking conductor up-
time' - this sounds interesting in terms of when A comes back up would it introduce a delay before it allowed A to take over (if preemption was enabled), A bit like preemption delay but longer - would the db be copied from B->A in the time where both are up and B is still primary? Would that be one way to keep preemption enabled and guard against the sync issue?
That would be interesting if it worked as I describe above. But doesn't sound ideal the other way around - ie when A first goes down we want the switch to B to be pretty much immediate. I suppose we can't have our cake and eat it on this?!
Original Message:
Sent: Jan 28, 2025 09:23 AM
From: chulcher
Subject: Changes overwritten when bringing primary MCR back online
Preemption normally shouldn't be enabled to prevent situations where the device, for whatever reason, is repeatedly going offline.
------------------------------
Carson Hulcher, ACEX#110
Original Message:
Sent: Jan 28, 2025 05:34 AM
From: cauliflower
Subject: Changes overwritten when bringing primary MCR back online
Ah, may have found it:
"
(UWS-MM-8C) *[mynode] #show database synchronize
Last L2 synchronization time: Tue Jan 28 10:22:35 2025
Last L3 synchronization time: Secondary not synchronized since last reboot
To Conductor Switch at x.x.x.x: succeeded
WMS Database backup file size: 8292733 bytes
Upgrademgr Database backup file size: 3387 bytes
Cluster upgrademgr Database backup file size: 3877 bytes
Local User Database backup file size: 92103 bytes
Global AP Database backup file size: 288426 bytes
IAP Database backup file size: 3760 bytes
License Database backup file size: 6850 bytes
CPSec Database backup file size: 3224 bytes
Bocmgr Database backup file size: 6032 bytes
L2 Synchronization took 9 second
L3 Synchronization took less than one second
4540 L2 synchronization attempted
292 L2 synchronization have failed
0 L3 synchronization attempted
0 L3 synchronization have failed
L2 Periodic synchronization is enabled and runs every 20 minutes
L3 Periodic synchronization is disabled
Synchronization doesn't include Captive Portal Custom data
Airmatch database gets synchronized periodically. Last synchronization time : 2025-01-28 10:09:54
"
Original Message:
Sent: Jan 28, 2025 05:15 AM
From: cauliflower
Subject: Changes overwritten when bringing primary MCR back online
Hello Carson,
Thank you, that's a useful explanation. The bit about this that I don't quite understand is you say once we bring A back up we would need to reload the site B MCR to initiate site A MCR taking over the VRRP address again, but in my experience if A comes back online and has a higher VRRP priority then it will takeover automatically - I guess this is because we have preemption enabled and that is a non-default configuration? Perhaps we need to think about turning that off so we have more control over this kind of situation. Or setting a long preemption delay so that there is time for a sync to take place before it does take over. Do you know how often the config syncs between the MCRs?
Guy
Original Message:
Sent: Jan 27, 2025 04:08 PM
From: chulcher
Subject: Changes overwritten when bringing primary MCR back online
Primary MCR in an L2 VRRP synchronized pair is whichever MCR currently has the MASTER role in the VRRP. There isn't a static configuration of active/backup or primary/secondary, the role is entirely dependent on the VRRP state. This is why the VRRP state is very important when bringing up the second MCR the very first time, so that you don't accidentally overwrite configuration in an unexpected direction. If you were familiar with AOS 6, this was the same behavior when creating a master/standby-master pair.
If you want to take an MCR out of service for a period of time, then shut the MCR down, all the way, powered down. At that point the other MCR (site B) in the VRRP pair will take over the VRRP VIP and will now act as the single source for synchronization of the MCR pair. Then, when ready to bring the other MCR (site A) back online, the VRRP state for the site A MCR will be BACKUP (assuming the VLAN is correctly operating at L2) after fully booting, and synchronization will happen from the MASTER (site B) to the BACKUP. If you then want the site A MCR to be MASTER in the VRRP relationship, reboot the site B MCR to initiate failover.
------------------------------
Carson Hulcher, ACEX#110
Original Message:
Sent: Jan 27, 2025 03:26 PM
From: cauliflower
Subject: Changes overwritten when bringing primary MCR back online
Thank you, so even if the primary is offline for a few days, and we make changes on the standby while it is in the role of primary, those changes should not be lost when we bring the primary back up and they sync up (I just want to make sure I've got it right)?
Original Message:
Sent: Jan 27, 2025 01:30 PM
From: chulcher
Subject: Changes overwritten when bringing primary MCR back online
Just shutdown the MCR at site A while communication is interrupted, turn back on after communication is back online.
------------------------------
Carson Hulcher, ACEX#110
Original Message:
Sent: Jan 27, 2025 01:04 PM
From: cauliflower
Subject: Changes overwritten when bringing primary MCR back online
Ok thanks Carson, I didn't realise this was an unsupported set-up. So to make this work for us we could change the VRRP priorities so that the standby remains primary until the config is shared when we bring the ex-primary back up, and then revert to our original settings. Does that sound like a plausible workaround?
Original Message:
Sent: Jan 24, 2025 11:04 AM
From: chulcher
Subject: Changes overwritten when bringing primary MCR back online
Gotta be honest, at this point your description sounds like you have an unsupported setup.
MCRs in an L2 pair should be at the same site, on the same network, with the VRRP pair providing redundancy for single device failure. If further redundancy is required with off-site DR/HA, L3 redundancy needs to be setup with a single or paired MCR at the remote site.
What is the connectivity between the APs and the MCs at site A and site B? Expectation is that APs will be on the same LAN as the controllers, CAP over WAN isn't a supported setup. If you have fiber connectivity between everything, then you're good.
The reason you lost all changes is because the site A MCR was allowed to continue running while not being able to communicate with the site B MCR. L2 redundancy is based on the VRRP instance used for that setup between the two MCR. When you brought the connection back up between the two MCR, the site A MCR won the election when both were found to be active, so the site A MCR synchronized configuration over to site B.
------------------------------
Carson Hulcher, ACEX#110
Original Message:
Sent: Jan 24, 2025 10:43 AM
From: cauliflower
Subject: Changes overwritten when bringing primary MCR back online
Just to describe the situation - we have 4 routers that serve the wireless system, 2 on each site. We had to do some work on the routers on the A site so we failed all of our APs over to the B site by shutdown the ports to the A site clusters, and the same for the MCRs, they remained completely offline (but still powered up) for 4 days or so. The router work was completed this morning, we brought the A side clusters back up and failed the APs back over, and then brought the A site MCRs back online - that all worked fine in that the A site MCRs became active and took over from the B side.... but as I say we lost the config changes that had been made while A was offline. What should we have done to keep the config from being overwritten when the A MCRs came online again?
Original Message:
Sent: Jan 24, 2025 10:23 AM
From: chulcher
Subject: Changes overwritten when bringing primary MCR back online
In that case, just shutdown the other MCR in the pair so that you don't end up in a split VRRP situation, then when connectivity is re-established you can turn the MCR back on and the configuration should sync correctly.
------------------------------
Carson Hulcher, ACEX#110
Original Message:
Sent: Jan 24, 2025 10:17 AM
From: cauliflower
Subject: Changes overwritten when bringing primary MCR back online
Thanks Carson,
It's actually L2 redundancy (we route that particular VLAN between DCs), but by the sounds of it the same as you say above would be true
Original Message:
Sent: Jan 24, 2025 09:37 AM
From: chulcher
Subject: Changes overwritten when bringing primary MCR back online
If by active/standby you are referring to L3 redundancy, this is the expected behavior. The standby MCR is setup as a secondary MCR and is a copy from the primary MCR. To change the behavior requires breaking that L3 relationship between the two MCR setups so that the secondary is now marked as primary, and also reconfigure all of the managed mobility controllers to now see that MCR as primary.
L3 redundancy is there to provide an off-site redundancy option should the primary MCR pair be unavailable and not expected to become available again within a reasonable time.
Solutions:
- Don't make configuration changes while the primary MCR is unavailable.
- Record all changes made during the outage period and make those changes on the primary as well.
------------------------------
Carson Hulcher, ACEX#110
Original Message:
Sent: Jan 24, 2025 05:51 AM
From: cauliflower
Subject: Changes overwritten when bringing primary MCR back online
Hello,
AOS 8.10.0.15
We run two clusters, each with active and standby MCRs. We recently had to take our active MCRS offline to do some router work. While they were offline (the standbys had taken over) we made some changes to config on the (now active) standby MCRs. However when we brought the active MCRs back online the changes we had made while they were offline were overwritten. Is this expected? How do we stop this happening?
Many thanks,
Guy