Ah, semantics. VRRP configured as per your first screenshot is just "VRRP VIP". VRRP configured within the cluster profile is "Cluster VRRP".
Original Message:
Sent: Feb 07, 2024 11:57 AM
From: cm119
Subject: AOS 8 Cluster upgrade with mesh APs
Yes I do have separate VRRPs setup for the Cluster and for RADIUS. See screenshots.
Original Message:
Sent: Feb 07, 2024 11:33 AM
From: chulcher
Subject: AOS 8 Cluster upgrade with mesh APs
Cluster VRRPs should only be used for RADIUS. Controller discovery and LMS/B-LMS target should be a separate VRRP VIP configured on two or more controllers.
Not that I think that using the cluster VRRP is causing the issue you're seeing, just mentioning best practice.
------------------------------
Carson Hulcher, ACEX#110
Original Message:
Sent: Feb 07, 2024 11:08 AM
From: cm119
Subject: AOS 8 Cluster upgrade with mesh APs
LMS/B-LMS are pointing to the Cluster VRRPs. I'll open a TAC case with this, and there are two other issues I'm seeing so far with 8.10.0.9, I can't delete offline APs (hit delete, nothing happens), and WLANs are missing from the correct folder in the hierarchy in the UI, even though subfolders show them as inherited, and they show up in the CLI config fine.
Original Message:
Sent: Feb 07, 2024 12:50 AM
From: chulcher
Subject: AOS 8 Cluster upgrade with mesh APs
Also, are you pointing the LMS/B-LMS at a specific controller within each cluster or to a VRRP VIP that has been manually configured (NOT one of the cluster VIPs) within the cluster?
------------------------------
Carson Hulcher, ACEX#110
Original Message:
Sent: Feb 06, 2024 11:24 PM
From: cm119
Subject: AOS 8 Cluster upgrade with mesh APs
Well I just tried this on my small cluster, and the LMS preemption had no impact. As expected the APs failed over to B-LMS while the primary cluster MDs reloaded, but they never failed back after the cluster MDs were back online. I waited much longer than the preemption timer, and preemption was enable on the AP system profile for the APs on the MDs I upgraded. I'm guessing there is something about the Cluster setup that causes APs not to look at the preemption if they have a connection to a cluster?
Original Message:
Sent: Feb 06, 2024 12:46 AM
From: chulcher
Subject: AOS 8 Cluster upgrade with mesh APs
There have been some improvements in the Live Upgrade code that makes the process more forgiving when there are APs that lose connection during the process, e.g., because of a Mesh Portal reboot, but I'm pretty sure those changes were in 8.8 or 8.9. My last few lab upgrades have been quite a bit quicker.
Moving to 8.10 from 8.7, I'd recommend running an Image Preload job for all APs and just reboot all controllers at the same time to move to the new version.
If you are concerned about B-LMS discovery/behavior when the primary cluster/LMS is unavailable, then make sure to configure the LMS preempt option before moving forward.
I've not heard of any issue with GRE tunnels when going to 8.10.
Note: Grab flash-backups from all MCR and MC prior to starting any of the upgrade activities.
------------------------------
Carson Hulcher, ACEX#110
Original Message:
Sent: Feb 05, 2024 11:30 PM
From: cm119
Subject: AOS 8 Cluster upgrade with mesh APs
I'm curious what method everyone is using to upgrade clusters that have mesh APs. We'll be going from 8.7 to 8.10.
- Live Upgrade. We've always done Live Upgrade in the past, and it has worked, but per documentation it is not recommended/supported with mesh. The issue we run into is all the mesh APs fail/timeout during their upgrade group, and it halts the upgrade process for 10 minutes until it considers the AP timed out, then moves onto the next group. After a very long time, several hours when there are many mesh APs, the upgrade eventually finishes and all the mesh APs come back online. We've never had a major issue other than the long delays.
- Upgrade MDs, Preload APs, Reboot all cluster MDs at the same time?
- Upgrade MDs, Preload APs, Reboot cluster MDs one at a time?
I'm curious about options 2 and 3. Which one is recommended if we do NOT go with Live Upgrade? Which method is safest with mesh APs to ensure they don't get stranded?
Additional (important) notes: We have multiple clusters at our main campus, and in some cases we have APs/AP Groups setup with Backup LMS to alternate clusters as a failover in case an entire cluster is offline. In the case of these options, especially Option 2, rebooting all MDs in the same time, am I going to run into an issue with the APs trying to join Backup LMS while all it's cluster MDs are offline rebooting? I suppose worst case I would just have to reboot them again to kick them back to the correct cluster, but again I'm worried about the possibility of stranding mesh APs. I do have our mesh clusters configured at a higher configuration folder than the campus clusters, so in theory the same cluster config exists on all clusters, so should still not strand mesh APs if they happen to move between clusters.
Last question, we also use GRE tunnels between clusters for some VLAN extension needs. We've never had an issue with this through previous upgrades (8.3>8.5>8.6>8.7). Has anyone seen any issues with GRE tunnels between MDs upgrading from 8.7 to 8.10?
Thanks