Wired Intelligent Edge

 View Only
last person joined: yesterday 

Bring performance and reliability to your network with the HPE Aruba Networking Core, Aggregation, and Access layer switches. Discuss the latest features and functionality of your switching devices, and find ways to improve security across your network to bring together a mobile-first solution
Expand all | Collapse all

5406Rzl2 upgrades - minimal downtime

This thread has been viewed 122 times
  • 1.  5406Rzl2 upgrades - minimal downtime

    Posted Jul 24, 2019 03:37 PM

    Hello airheads,

     

    Our core switches are 5406Rzl2 each with dual management modules. I was under the impression that I could upgrade the code (using some mix of redundancy and other features) with minimal downtime. Around the Internet, I've seen various ramblings of things tried but never any definite yes or no. Is there a way to accomplish this?


    #5400


  • 2.  RE: 5406Rzl2 upgrades - minimal downtime
    Best Answer

    EMPLOYEE
    Posted Jul 25, 2019 11:11 AM

    Greetings!

     

    The 5400R has dual management modules for redundancy (if one MM fails, the second takes over instantaneously), which can also be used to perform switch software upgrades with reduced downtime by loading a new software image, booting the standby MM onto the new version, then performing a redundancy switchover to swap to the standby MM. 

     

    If you have a pair of 5400Rs stacked using VSF (front plane stacking), the VSF Fast Software Upgrade feature uses an automated sequenced reboot process to minimize downtime for devices with redundant links (at least one link per VSF member chassis). 



  • 3.  RE: 5406Rzl2 upgrades - minimal downtime

    Posted Jul 26, 2019 10:31 PM
      |   view attached

    hi

     

    see attachment

     

    Thanks

     

     

    Attachment(s)



  • 4.  RE: 5406Rzl2 upgrades - minimal downtime

    Posted Aug 01, 2019 11:18 AM

    I'm hoping to try this during our maintenance window next week, but according to TAC...

     

    #copy sftp flash user ftpuser ftp.server.ip KB_16_08_0003.swi secondary 

    #boot set-default flash primary 

    #write memory 

    #boot standby 

    #show redundancy (wait for sync) 

    #redundancy switchover 

     

    ...will upgrade with very little to no downtime. In their lab, there were no pings dropped. I'll add my results next week.



  • 5.  RE: 5406Rzl2 upgrades - minimal downtime

    MVP GURU
    Posted Aug 01, 2019 05:18 PM

    I'm really curious too...I'm planning a similar update on our Aruba 5400R zl2 with dual MMs and NonStop Switching redundancy enabled and the procedure I was able to detail is basically the same you just wrote in few lines:

     

    Aruba_5400R_zl2_Redundant_MMs_software_update_procedure.png

    I discussed L2/L3 traffic disruption theoretically on an Aruba 5400R zl2 with Dual MM and NonStop switching redundancy enabled (theoretically because I was without any pervious hands-on experience on the subject) here and also lightly here...my take is that Layer 2 traffic disruption should be nearly equal to zero (sub second disruption = basically no ping loss), Layer 3 traffic disruption should be instead not completely equal to zero...but I haven't been able to understand how much (in seconds) it will potentially be.



  • 6.  RE: 5406Rzl2 upgrades - minimal downtime

    Posted Sep 27, 2019 05:26 PM

    Mike, what were the results? Can you quantify what you're network disruption was, if any?



  • 7.  RE: 5406Rzl2 upgrades - minimal downtime

    Posted Sep 30, 2019 08:59 AM

    Apparently, my original TAC info was faulty. As I experienced in my testing, there was downtime (IIRC ~90 seconds). The expanation from TAC below.

     

    I was able to find out that while trying to upgrade firmware even if there are two MM modules and Non stop switching is enabled the other software updated to standby is different from the software already present in active MM. Here the software are mismatched , so when you go ahead and do a redundancy switchover command is executed the active MM will give control to standby which finishes booting and will become active but the behavior noticed will be of warm standby. This is clearly explained in one of the documents I referred to. I will specify the link of the document in this email. Also Apologies for the information provided earlier. I was able to noticed that when the software in active and standby MM are same during redundancy switchover after switching to standby only the active MM will reboot and the ping is continuous with no drops in L2 and L3 (This is a condition that happens during failover) but during image mismatch when you do a redundancy switchover after switching to standby the active MM and the whole chassis reboots and this is known behavior and a limitation on the 5400R switches.
    
    Link : 
    http://h20628.www2.hp.com/km-ext/kmcsdirect/emr_na-c04943210-3.pdf
    
    Refer Page 566 " Other software mismatch condition" and Appendix A " Chassis redundancy (HPE 5400R Switches)"
    Also in Page 558 "About setting the rapid switchover stale timer " is specified.


  • 8.  RE: 5406Rzl2 upgrades - minimal downtime

    MVP GURU
    Posted Sep 30, 2019 10:56 AM

    Hi Mike, sorry...just a question: NonStop Switching redudancy mode (if active <-- where Configured Mode = Current Mode) doesn't admit that MM1 and MM2 run with mismatched software versions...that's just to say...if the NonStop Switching redudancy mode is just enabled the valid scenario falls into the case described with this TAC statement:

     

    ...when the software in active and standby MM are same during redundancy switchover after switching to standby only the active MM will reboot and the ping is continuous with no drops in L2 and L3 (This is a condition that happens during failover)...

    So, just to be clear, wasn't really your system in NonStop Switching redudancy mode (so it was really in Warm Standby)?

     

    What I mean is...did you started your upgrade journey with an Aruba 5406R zl2 switch where its "redudancy status" was similar to (software versions listed are those of my system, YMMV):

     

    5412Rzl2# show redundancy 
    
     Configured Mode: Nonstop Switching 
     Current Mode   : Nonstop Switching
    
     Rapid Switchover Stale Timer : 90
     Failovers     : 2
     Last Failover : Wed May 16 11:57:53 2018
    
    Slot Module Description                       State    SW Version    Boot Image
    ---- ---------------------------------------- -------- ------------- ----------
    MM1  HP J9827A Management Module 5400Rzl2     Active   KB.16.05.0007 Primary  
    MM2  HP J9827A Management Module 5400Rzl2     Standby  KB.16.05.0007 Primary  
    
    5412Rzl2# show redundancy detail 
    
      Slot Role     Module Up Since     State Since         State            
      ---- -------- ------------------- ------------------- -----------------
      MM1  Active   05/16/18 14:41:30   05/16/18 14:48:53   Active           
      MM2  Standby  05/16/18 14:49:26   05/16/18 14:50:47   Standby          
     
     Failover Log:
    
      Slot Role     Time                Reason         
      ---- -------- ------------------- ---------------
      MM2  Active   05/16/18 11:57:52   Switchover     
      MM1  Active   05/16/18 11:50:09   Switchover     
    
    5412Rzl2# show flash
    Image             Size (bytes) Date     Version 
    ----------------- ------------ -------- --------------
    Primary Image    :    33511892 03/27/18 KB.16.05.0007        
    Secondary Image  :    33511892 03/27/18 KB.16.05.0007       
    
    Boot ROM Version 
    ----------------
    Primary Boot ROM Version   : KB.16.01.0006
    Secondary Boot ROM Version : KB.16.01.0006
    
    Default Boot Image   : Primary
    Default Boot ROM     : Primary
    
    5412Rzl2# show version
    Management Module 1: Active
    
    Image stamp:    /ws/swbuildm/rel_venice_qaoff/code/build/bom(swbuildm_rel_venice_qaoff_rel_venice)
                    Mar 26 2018 23:16:09
                    KB.16.05.0007
                    847
    Boot Image:     Primary
    
    Boot ROM Version:    KB.16.01.0006
    Active Boot ROM:     Primary
    
    
    Management Module 2: Standby
    Image stamp:    /ws/swbuildm/rel_venice_qaoff/code/build/bom(swbuildm_rel_venice_qaoff_rel_venice)
                    Mar 26 2018 23:16:09
                    KB.16.05.0007
                    847
    Boot Image:     Primary

    Or what else?

     

    Apart from what TAC wrote you (which is usually found on any ArubaOS-Switch guides) also this (old) article looks interesting to read.



  • 9.  RE: 5406Rzl2 upgrades - minimal downtime

    Posted Oct 02, 2019 11:07 AM

    I think that yours is the scenario in which I started but cannot recall at this point. I'll give that HPE artcle a read and perhaps try it this winter when I look at upgrades. Thanks!



  • 10.  RE: 5406Rzl2 upgrades - minimal downtime

    Posted Mar 09, 2021 02:36 PM
    I just performed these steps on a production switch:

    #copy tftp flash <TFTP SERVER IP> KB_16_10_0012.swi primary 

    #boot set-default flash primary 

    #write memory 

    #boot standby 

    #show redundancy (wait for sync) 

    #redundancy switchover 


    The upgrade was fully successful and I only lost 7 pings to a device on that switch after issuing the "redundancy switchover" command to reboot the commander management module.  My switch was in "non-stop switching" state prior to starting the upgrade. 

    From what I read, the redundancy switchover forces the the modules to restart to load the new software image which causes the short outage.  Redundancy switchover without a software upgrade wold cause no outage.

    NOTE:  My lightweight Aruba APs connected to the switch had to reboot so there was a longer outage for wireless connected devices.



    ------------------------------
    Aaron Wheeler
    ------------------------------



  • 11.  RE: 5406Rzl2 upgrades - minimal downtime

    MVP GURU
    Posted Mar 09, 2021 03:53 PM
    Hi @Azz really interesting! just one question: when your wrote "The upgrade was fully successful and I only lost 7 pings to a device on that switch after issuing the "redundancy switchover" command to reboot the commander management module.  My switch was in "non-stop switching" state prior to starting the upgrade." did you mean that the ping test was done between two hosts and both were members of different routed subnets on the Aruba 5400R zl2? that's to understand if the impact you saw refers to the "routing" and not to the "switching" capabilities.

    ------------------------------
    Davide Poletto
    ------------------------------



  • 12.  RE: 5406Rzl2 upgrades - minimal downtime

    Posted Mar 09, 2021 04:57 PM
    ​Hi Davide,

    The ping was from another subnet across another 5400 VSF stack which is running as a L3 core with dual 1Gbps fibre uplinks to the 5400 being upgraded. The ping was just a standard windows ping command that I think has a default 2 sec delay between pings which means around 10-15 secs drop for me.

    I have two more 5400 VSF stacks to upgrade on the site when I can get a operational outage (24x7 operations).  I will post my results of the VSF "Fast Software Upgrade" impact when I get it done.  Likely could be a few weeks before I get the opportunity though.

    ------------------------------
    Aaron Wheeler
    ------------------------------



  • 13.  RE: 5406Rzl2 upgrades - minimal downtime

    MVP GURU
    Posted Mar 10, 2021 11:09 AM
    Hello @Azz, your feedback is really appreciated!​ Thanks!

    Since 15 seconds seem a long time to me...I have just one question more, you wrote the ping was done "... from another subnet across another 5400 VSF stack which is running as a L3 core with dual 1Gbps fibre uplinks to the 5400 being upgraded" ...that made immediately me to suspect that your upgraded 5400 wasn't exactly performing "routing" for/between hosts involved in the ping test you did...am I wrong? Is the 5400 connected as a Layer 2 extension (via a simple or via an aggregated uplink) to the VSF acting as Layer 3 or is it routed to the VSF Core?

    My aim is to understand what downtime to expect (in seconds) on Aruba 5400R zl2 configured with "NonStop Switching" redundancy mode and acting as a IP Router for directly connected subnets when it is upgraded with the technique described on this thread (Standby MM first then ex-Active MM after a redundancy switchover).

    If your Aruba 5400R zl2 configured with "NonStop Switching" redundancy wasn't performing IP routing (mine could be a wrong assumption) but was acting as a pure Layer 2 switch I would have expected no packets loss at all (no loss at all for Layer 2 switched traffic...but a traffic disruption only at Layer 3 level for routed traffic traversing the 5400).

    How is the 5400 uplinked to your VSF? LACP I suppose...

    Kind regards, Davide.


    ------------------------------
    Davide Poletto
    ------------------------------



  • 14.  RE: 5406Rzl2 upgrades - minimal downtime

    Posted Mar 10, 2021 05:23 PM
    Hi @parnassus,

    Happy to help out, I am always looking for real world examples to understand possible impacts to production networks as well.

    Ok this network is as follows:

    1. Core pair of 5412rRzl2 chassis switches with a single MM in each.  They are inter-connected with 2 x 10Gbps VSF (about 2km apart as it is a big site).
    2. The switch being upgraded was a single 5406Rzl2 with dual MMs in another building on site.  It is connected with a 2 x 10Gbps LAG trunk to BOTH of the core switches above via Long Haul SM fibre.
    3. The core VSF runs L3 routing and switching.
    4. The upgraded switch purely runs L2 VLANs back to the core switches.

    My workstation was on a subnet the other side of the core switches so routing through the core to access the VLAN of the device I was pinging connected to a switch port on the switch being upgraded.  Admittedly, device I was pinging was a printer so possibly its NIC isn't the fastest to recover but I doubt it would be significant.

    I can 100% confirm that "nonstop switching" was configured on the upgraded switch.  It was upgraded from KB.16.07.003 to KB.16.10.0012.

    PS:  Firmware 16.10.0012 looks stable so far.  Running also on a number of 2930F VSF stacks in the same site.  Yet to try on a switch running routing though.

    ------------------------------
    Aaron Wheeler
    ------------------------------



  • 15.  RE: 5406Rzl2 upgrades - minimal downtime

    MVP GURU
    Posted Mar 11, 2021 08:53 PM
    Hi @Azz, these two sentences:

    "The core VSF runs L3 routing and switching." and "The upgraded switch purely runs L2 VLANs back to the core switches."

    and the fact the ping test you performed had packets traversing (and routed by) the VSF reaching the destination connected on the upgraded 5406R zl2 in "NonStop Redundancy" mode of operation (connected back to VSF as a Layer 2 extension)...well...it justifies - I believe - my doubts about the 7 pings you lost (15 seconds or 7 seconds doesn't matter).

    You shouldn't have lost any ping, isn't it?

    If routing wasn't affected (because it was/is the duty of your untouched VSF) why you lost (some) pings with "NonStop Redundancy" mode enabled on the upgraded switch which doesn't perform any routing but relies on the VSF?

    Kind regards, Davide.

    ------------------------------
    Davide Poletto
    ------------------------------



  • 16.  RE: 5406Rzl2 upgrades - minimal downtime

    Posted Mar 11, 2021 10:05 PM
    Hi @parnassus,

    I don't disagree with your understanding and I was expecting the same result of essentially no impact with that upgrade.

    However, I just took some time to review the logs and I think I have found the smoking gun which would then give a short outage as seen:

    I 03/09/21 07:26:10 00266 system: AM1: Resetting Mgmt Module 2
    I 03/09/21 07:26:47 00539 stacking: AM1: Initial sync to standby starting
    I 03/09/21 07:26:49 00539 stacking: AM1: Initial sync to standby complete
    I 03/09/21 07:26:49 00066 system: AM1: Mgmt Module 2 Booted
    I 03/09/21 07:26:49 00261 system: AM1: Mgmt Module 2 in Standby Mode
    W 03/09/21 07:26:49 03801 system: AM1: Mgmt Module 2 - Nonstop switching disabled because of SW version mismatch. Warm standby enabled.

    It appears that when you do the initial load of the new firmware into the standby MM in preparation for switchover, the system automatically switches from "Nonstop Switching" to "Warm Standby" mode...  My understanding is that warm standby will see a short outage as the switchover occurs.

    Based on this, I suspect a software update will require a short outage despite the initial non-stop switching mode, due to the software version mismatch which will occur during the upgrade process... unless there is a way around that.


    ------------------------------
    Aaron Wheeler
    ------------------------------



  • 17.  RE: 5406Rzl2 upgrades - minimal downtime

    MVP GURU
    Posted Mar 12, 2021 07:22 AM
    Hi @Azz, not sure the smoking gun was actually that very one...I mean it's pretty strange...giving your explanation, it looks like a "Catch 22" scenario (Tricky)...since - if our assumptions are correct - there will always be a software mismatch​ between the new (once updated) MM - the previous SMM next-to-be AMM - that you manually force to be the new AMM with a redundancy switchover...compared to the MM - the previous AMM - which will continue to run the older non-updated-yet software...thus the transition from "NonStop Switching" redundancy mode to "Warm Standby" redundancy mode will always happen (that's unwanted!)...or am I missing something and my line of reasoning is wrong? if so, where it is wrong?

    I read, as example, the Section "Other software version mismatch conditions" on the Aruba 3810 / 5400R Management and Configuration Guide for ArubaOS-Switch 16.10 at page 800 and also the Section "About downloading a new software version" at page 833.

    Out of curiosity did you followed this procedure or a similar one?

    ------------------------------
    Davide Poletto
    ------------------------------



  • 18.  RE: 5406Rzl2 upgrades - minimal downtime

    Posted Mar 14, 2021 05:33 AM
    Hi @parnassus

    That is the exact procedure that I followed.  For confirmation, the exact commands I executed are listed in one of my earlier posts in this thread.

    As you have stated and what I had read before the upgrade, was that using that procedure should produce no impact to switch operation​ when switch is in nonstop switching mode with dual redundant MM. 

    However, it appears it actually followed those steps from the bottom of p800 and onto p801 of the document you linked above.  This explains my results but was not expected / wanted behaviour based on documentation I had seen.​

    ------------------------------
    Aaron Wheeler
    ------------------------------



  • 19.  RE: 5406Rzl2 upgrades - minimal downtime

    MVP GURU
    Posted Mar 14, 2021 06:51 AM
    Exactly! The point is that "it looks like the NonStop Switching redundancy mode's capability of non disrupting normal Layer 2 switching traffic traversing the switch didn't prove to be true in the scenario we're discussing about...I can understand that Layer 3 Routing will be briefly affected (the the famous "least amount of downtime") but I personally can't understand what happened at Layer 2 switched traffic...

    ------------------------------
    Davide Poletto
    ------------------------------



  • 20.  RE: 5406Rzl2 upgrades - minimal downtime

    Posted Mar 14, 2021 06:14 PM
    Although it was not the documented expectation, I sort of understand the short traffic impact. 

    When the MM switchover occurs to upgrade to the newer software, the Standby MM needs to finish its full restart to load the new firmware before it can start processing traffic. I suspect the switching blades will need to be updated with the new software as well which may cause some impact.

    I might raise a query with my Aruba account team just to clarify with support in case there is a way to make this non-interrupting as it would be nice not to have to take outages in these larger access switch chassis as they usually carry a higher workload.

    FYI. In case you are interested, the upgrade process on a 2 switch 2930F VSF stack causes approximately 3m 30s impact to traffic on the switch.  I have tested with a number of stacks and all had almost identical results.

    ------------------------------
    Aaron Wheeler
    ------------------------------



  • 21.  RE: 5406Rzl2 upgrades - minimal downtime

    MVP GURU
    Posted Mar 10, 2021 03:08 AM
    Thanks for feedback !

    ------------------------------
    PowerArubaSW : Powershell Module to use Aruba Switch API for Vlan, VlanPorts, LACP, LLDP...

    PowerArubaCP: Powershell Module to use ClearPass API (create NAD, Guest...)

    PowerArubaCX: Powershell Module to use ArubaCX API (get interface/vlan/ports info)..

    ACEP / ACMX #107 / ACDX #1281
    ------------------------------