Hi Mark,
VSX switch roles do not indicate which device is forwarding traffic at a given time as VSX is an active-active forwarding solution. The roles are used to determine which device stays active when there is a VSX split, such as when the ISL goes down and for determining the direction of configuration-sync. If the VSX ISL goes down, the primary switch keeps forwarding traffic while the secondary switch blocks ports from participating in the VSX LAGs.
You can swap the switch roles on the fly without shutting down ISL or Keepalive link. ISL will go out of sync momentarily because now both VSX nodes will have same role (primary in below example)during this time. But after you configure VSX peer with a different role(secondary), ISL will be in sync again.
What happens when ISL link is down but keepalive is up
VSX switches use their keepalive connection to determine that they are both up and running. Once that is determined, the user-configured primary VSX switch keeps its multichassis (VSX) LAG links up and the secondary VSX switch forces its VSX LAG links to go down with the appropriate reason. Once the ISL link is up, the MAC and ARP tables, LACP and STP states of the primary switch are synchronized to the secondary switch. Then, the configured delay timer starts. Once the delay timer expires, the secondary VSX switch brings up its VSX LAG links.
1)VSX status on secondary switch
8325-02# show vsx status
VSX Operational State
---------------------
ISL channel : In-Sync
ISL mgmt channel : operational
Config Sync Status : In-Sync
NAE : peer_reachable
HTTPS Server : peer_reachable
Attribute Local Peer
------------ -------- --------
ISL link lag255 lag255
ISL version 2 2
System MAC 0a:00:06:00:00:00 0a:00:06:00:00:00
Platform 8325 8325
Software Version GL.10.10.1020 GL.10.10.1020
Device Role secondary primary
2)VSX status on secondary switch after changing role to primary
8325-02(config)# vsx
8325-02(config-vsx)# role primary
8325-02# show vsx status
VSX Operational State
---------------------
ISL channel : Out-Of-Sync
ISL mgmt channel : inter_switch_link_down
Config Sync Status : Out-Of-Sync
NAE : peer_unreachable
HTTPS Server : peer_unreachable
Attribute Local Peer
------------ -------- --------
ISL link lag255
ISL version 2
System MAC 0a:00:06:00:00:00
Platform 8325
Software Version GL.10.10.1020
Device Role primary primary (Device roles inconsistent)
3)VSX status on secondary switch after changing VSX peer role to secondary
8325-02# show vsx status
VSX Operational State
---------------------
ISL channel : In-Sync
ISL mgmt channel : operational
Config Sync Status : In-Sync
NAE : peer_reachable
HTTPS Server : peer_reachable
Attribute Local Peer
------------ -------- --------
ISL link lag255 lag255
ISL version 2 2
System MAC 0a:00:06:00:00:00 0a:00:06:00:00:00
Platform 8325 8325
Software Version GL.10.10.1020 GL.10.10.1020
Device Role primary secondary
Hope this helps !
Harkanwal
Original Message:
Sent: Nov 08, 2023 07:33 AM
From: mark.bossert
Subject: VSX behaviour during role switch
Hi,
due to a (possible firmware) issue our client is considering switching the roles of the switches in their VSX-cluster - e.g. turn primary into secondary and vice versa.
This switch is supposed to happen during business hours, with a warning mail for the end users.
My plan of action is as follows:
- connect Laptop to USB-Console of Switch-01 (VSX-primary)
- `shutdown` all ports on Switch-01 (-> ISL and Keepalive will go down here)
- `no role` on Switch-01
- connect to Switch-02 via SSH (VSX-secondary)
- `role primary` on Switch-02 to turn it into VSX-primary
- `role secondary` on Switch-01
- `no shutdown` on all ports on Switch-01 (-> ISL and Keepalive should come back up here)
Configuration of the VSX-primary is as follows:
vsx
system-mac 02:00:00:00:00:00
inter-switch-link lag 10
role primary
keepalive peer 192.168.100.2 source 192.168.100.1 vrf vsx-keepalive
vsx-sync mclag-interfaces vsx-global
All VLANS are annotated with vsx-sync. Hence I expect all VLANs/MC-LAGs to automatically reestablish, even if they should get lost with the `no role` command.
I'm interested (1) if anybody actually performed this and (2) whether this approach will cause interruptions in data-flow.