Network Management

last person joined: yesterday 

Keep an informative eye on your network with HPE Aruba Networking network management solutions
Expand all | Collapse all

Upgrade nightmare from 7.5.5. to 7.6.3 for Airwave and Failover Airwave

This thread has been viewed 0 times
  • 1.  Upgrade nightmare from 7.5.5. to 7.6.3 for Airwave and Failover Airwave

    Posted Mar 23, 2013 06:30 AM

    So this morning, I was tasked to upgrade 2 Airwave servers (1 of them acting as the failover).

    These Airwave servers contain massive amounts of data, the backup of the primary is >5Gb

    This is what happened to me (doing upgrades remotely but they are hosted at our datacenter and we have a fibre connection to the datacenter)

     

    1. Upgrade the backup to 7.6.3 (works)

    2. disable the backup (amp_disable now)

    3. Upgrade the primary to 7.6.3

                     Lost internet connectivity in the office, my SSH session during the upgrade dies

    4. Upgrade never completes on the server, it is practically missing everything

                   doesn't understand start_amp_upgrade, amp_version, that directory is empty

    5. try to restore the backup on the failover, problem is its already been upgraded to 7.6.2 so backup won't restore

    6. Airheads and KB searches later

    7. I move amp_enable, start_amp_upgrade, amp_version and chmod a+x the files

    8. try the upgrade again, fails, doesn't complete. fails on /bin/chown -R apache.apache /var/airwave/rrd

                           we have about 1TB of RRD data on the server

    9. I see Sujatha's post regarding make, I see another from Rob's (rgin) regarding make, so I try it (root; make)

    10. it puts back certain files and allows me to execute the upgrade again and now it is stuck on

                      STEP 5: Installing upgrade.
       tail -f /var/log/upgrade/AMP-7.6.3-upgrade.log
                   /bin/chown -R apache.apache /var/airwave/rrd

    Won't go further than that.

     

    its been 4 hours now, maybe I should have called TAC ......might have to go with the 5Gb backup file to DC to re-install the primary AIrwave from scratch

     

             



  • 2.  RE: Upgrade nightmare from 7.5.5. to 7.6.3 for Airwave and Failover Airwave

    Posted Mar 25, 2013 08:15 AM

    So more on this...at the datacenter that same day at 830am, tried to re-install AMP using a bootable USB key but not working at all. On the phone with TAC, finally was able to get an external CDROM drive, re-installed AMP, restored my 5Gb backup (took 1.5 hours to restore).

     

    Got it up and running at 4pm.

     

    Ever since then it has been behaving really oddly, no AP down alarms come in but a constant wave on controller downs non stop, therefore I don't receive AP downs as the controller is "down" according to AMP. Like below.

     

    Device UpDevice Type is Controllerwlc-6.tdl.c6.dv3/25/2013 8:08 AMNormal-
    Device DownDevice has rebooted: Device uptime value changed (current: 497 days 2 hrs 27  (more...)wlc-6.tdl.c6.dv3/25/2013 8:08 AMCritical

    -

     

     

    Why??? I don't know..

     

    Increased the monitoring processes from 6 to 10 last night, still not performing correctly.

     

    We upgraded the controllers to 6.2.1.0 the same night as I upgraded AMP. AMP still has not seen or updated the uptime for more than of my 1/2 of my controllers....

     

    AMP is monitoring a network of 28 controllers and >2000 RAPs...version 7.5.5. now

     

    My TAC guy is only available on Tuesday to help me so I am here looking for help :).



  • 3.  RE: Upgrade nightmare from 7.5.5. to 7.6.3 for Airwave and Failover Airwave

    Posted Mar 25, 2013 09:19 AM

    I remember now that TAC had applied a patch to my airwave server to fix a snmp trap issue.

    My AMP cannot snmpwalk my controllers.

     

    I found the file they used but to apply it I do not seem to remember the commands.

     



  • 4.  RE: Upgrade nightmare from 7.5.5. to 7.6.3 for Airwave and Failover Airwave

    Posted Mar 25, 2013 09:55 AM

    found out how to apply it, applied patch

    patch -p0 < /var/airwave/custom/file.diff

    root; make

     

    I still cannot snmpwalk the controllers

    [root@aw-1 mercury]# snmpwalk -c tdl_prod$c6 -v 2c 172.30.128.66
    Did not find 'InterfaceIndex' in module ARUBA-TC (/usr/local/airwave/share/snmp/mibs/aruba-trap.my)
    Timeout: No Response from 172.30.128.66

    Going to go try and upgrade it to 7.6.3 today directly on the box at the DC

     



  • 5.  RE: Upgrade nightmare from 7.5.5. to 7.6.3 for Airwave and Failover Airwave

    EMPLOYEE
    Posted Mar 25, 2013 10:02 AM

    patching commands:

     

    Test patch:

    # patch --dry-run -p0 < /var/airwave/custom/patch.filename

    #### If patch runs clean without complaints, then remove dry-run ####

    Apply patch:

    # patch -p0 < /var/airwave/custom/patch.filename

    # root; make

     

    General tip:

    For a setup that has an AMP and Failover, I typically run #amp_disable on the Failover, then run through the upgrades on the AMP.  Once the AMP is completely upgraded and back online, then I run through the upgrades on the Failover.  If I have 2 AMPs, then I'd run through upgrades on one completely, and then the other.



  • 6.  RE: Upgrade nightmare from 7.5.5. to 7.6.3 for Airwave and Failover Airwave

    Posted Mar 25, 2013 10:19 AM

    thanks Rob, I was told backup first then primary....noted for next time...:)

     

    for some reason, I don't know why it cannot snmpwalk the controllers, I used a backup and restored it and te backup was 3 days old at the time..



  • 7.  RE: Upgrade nightmare from 7.5.5. to 7.6.3 for Airwave and Failover Airwave

    EMPLOYEE
    Posted Mar 25, 2013 10:37 AM

    You were probably told to do the backup first since it's non-critical, and a quick upgrade since there's no database migration.  You can really do either or, but as a habit - I typically disable the Failover so that it doesn't try to take over while I upgrade the main AMP.

     

    As for your snmpwalks not working, check the RPMs that are installed:

    # gr snmp

     

    These are the packages for SNMP that should be present:

    ###########

    aw-erlang-snmp-R14B-04.2.x86_64
    aw-net-snmp-aw-5.5-10.x86_64
    aw-net-snmp-aw-perlmods-5.5-10.x86_64

    aw-perl-Net-SNMP-6.0.1-3.noarch

    perl-SNMP_Session-1.12-4.el6.noarch

     

    If you're not seeing a package, you can install them from:

    /root/svn/mercury/src/noarch/rpms

    /root/svn/mercury/src/x86_64/rpms



  • 8.  RE: Upgrade nightmare from 7.5.5. to 7.6.3 for Airwave and Failover Airwave

    Posted Mar 25, 2013 11:53 AM

    snmp is installed, its just that it won't walk the controller I get an error message (shown in previous post).

     

    I am upgrading the server now to 7.6.3 so hopefully this fixes all my issues. Keep you posted.

     

    I just wanted to share my experience regarding the upgrade that didnt go right. Still love Airwave ;)



  • 9.  RE: Upgrade nightmare from 7.5.5. to 7.6.3 for Airwave and Failover Airwave

    Posted Mar 25, 2013 12:23 PM

    Upgrade Success! but....Airwave still cannot snmpwalk my controllers...so odd.

    I get a timeout

    AMP and controllers are on the same subnet, no FW rules or ports blocked...

     

     



  • 10.  RE: Upgrade nightmare from 7.5.5. to 7.6.3 for Airwave and Failover Airwave

    EMPLOYEE
    Posted Mar 25, 2013 12:34 PM

    It could be that the request is too large.  Try doing a shorter SNMP request like:

    # snmpwalk -c public -v 2c 10.10.10.10 sysDescr

    # snmpwalk -c public -v 2c 10.10.10.10 sysName

    I think having the OID in there might make a difference.



  • 11.  RE: Upgrade nightmare from 7.5.5. to 7.6.3 for Airwave and Failover Airwave

    Posted Mar 25, 2013 12:41 PM

    not working

    [root@aw-1 mercury]# snmpwalk -c public -v 2c 172.30.128.65 sysDescr
    Timeout: No Response from 172.30.128.65

    [root@aw-1 mercury]# snmpwalk -c public -v 2c 172.30.128.65 sysName
    Timeout: No Response from 172.30.128.65

    I'm gonna leave the DC now and head back to the office to TS some more.

     

    EDIT:

    nmap on a working server and AMP don't show snmp ports open.



  • 12.  RE: Upgrade nightmare from 7.5.5. to 7.6.3 for Airwave and Failover Airwave

    Posted Mar 25, 2013 09:26 PM

    So looks like there is an issue according to TAC. AMP is suffering from SNMP timeout issues. Have 2 TAC cases opened for this, 1 AMP and 1 Aruba.

     

    Also, my SNMP community string on the controllers included a $ which AOS version 6.2.1.0 and AMP version 7.6.3 do not seem to like and treated everything after the $ as a variable. Changed all the snmp comm. strings to a single word, no special chars.

     

    To note: the community string worked fine with AMP 7.5.5. and AOS 6.1.4.0.

     

    9pm now, leaving work. :)

     

    More on this tomomrrow ladies and gentleman.



  • 13.  RE: Upgrade nightmare from 7.5.5. to 7.6.3 for Airwave and Failover Airwave

    Posted Apr 04, 2013 09:34 AM

    So Bug filed with the AOS team.

    It seems as though AOS 6.2.1.0 is not responding to Airwave's SNMP requests.

     

    Hopefully we have a fix soon



  • 14.  RE: Upgrade nightmare from 7.5.5. to 7.6.3 for Airwave and Failover Airwave

    EMPLOYEE
    Posted Apr 04, 2013 06:01 PM

    **UPDATE**

     

    I don't think there's any actual bug going on here.  It's basically a limitation of the SNMP protocol.

     

    If you've got a dollar sign ($) in your community string, then you need to put single quotes (') around the community string.  No quotes and double quotes will return 'Timeout: No Response from x.x.x.x'

     

    # snmpwalk -c public -v 2c sysDescr = GOOD

     

    # snmpwalk -c pub$$lic -v 2c sysDescr = BAD

     

    # snmpwalk -c 'pub$$lic' -v 2c sysDescr = GOOD

     

    # snmpwalk -c "pub$$lic" -v 2c sysDescr = BAD

     

    Please doublecheck that you're using single quotes around the community string.



  • 15.  RE: Upgrade nightmare from 7.5.5. to 7.6.3 for Airwave and Failover Airwave

    Posted Apr 05, 2013 08:22 AM

    We changed the SNMP Community string to not include $ signs or any special characters.

     

    It is still not behaving correctly, out of my 2000 APs, AMP keeps fluctuating betweek all down, half of them down, or justa few down.

     

    According to TAC, in AMP's logs you can see SNMP timeouts.

    They are trying to pinpoint if its AMP or the controllers



  • 16.  RE: Upgrade nightmare from 7.5.5. to 7.6.3 for Airwave and Failover Airwave

    Posted Apr 12, 2013 09:52 AM

    Seems as though this has been resolved by TAC restarting the SNMP process on all my controllers.

     

    AMP can now properly detect the APs going offline/online.



    AOS code: 6.2.1.0



  • 17.  RE: Upgrade nightmare from 7.5.5. to 7.6.3 for Airwave and Failover Airwave

    EMPLOYEE
    Posted Apr 12, 2013 10:14 AM

    Thanks for the update.  This will happen less often when we move from SNMP polling towards AMON for Aruba devices (AMON is a proprietary protocol for Aruba that will essentially behave similar to how traps behave, providing more real time data).



  • 18.  RE: Upgrade nightmare from 7.5.5. to 7.6.3 for Airwave and Failover Airwave

    Posted Apr 12, 2013 10:36 AM
    Not a problem. Glad to share the information. I knew I was taking a risk upgrading to early deployment code but if no one does it on a huge network, then we wouldn't find problems that we could fix!

    Ya I remember you mentioning that at Airheads regarding AMON. Thats great.

    For the time being, hopefully this remains stable.


  • 19.  RE: Upgrade nightmare from 7.5.5. to 7.6.3 for Airwave and Failover Airwave

    Posted Apr 17, 2013 09:09 AM
    Not Stable. Issue re-occuring.
    TAC going back to investigate with engineering


  • 20.  RE: Upgrade nightmare from 7.5.5. to 7.6.3 for Airwave and Failover Airwave

    Posted Jun 14, 2013 09:10 AM
    Everything has been resolved with a C-Build firmware of the Aruba controller.
    6.2.1.1 build 38320
    This fixed the issue where the Int32 maximum was being reached on the Uptime value of the controller.

    Upgraded my 28 controllers, stable and AMP is reporting everything correctly again.