ArubaOS and Controllers

Reply
New Contributor

STM crashing on master controller

Hi forum members,

First time posting, and well it's a bit of a long post to explain what's happening. I have TAC case open on this, but was wondering if I can get feedback on if anyone else has experienced this issue.

Quick topology overview: A 2400 aruba controller, running as master. Handling about 24 APs in a corporate office, a 2400 aruba controller running local in a manufacturing site - 28 APs, two 3600 controllers, running in a manufacturing site, 54 APs each. The whole campus is connected with fiber, so very low latency, lots of bandwidth.

I recently tried to upgrade ArubaOS from 3.3.1.31 (pretty old) to 3.4.2.7 (fairly new, but not too new because I don't like to do bleeding edge).

Well the upgrade went horrible wrong. My 3600 series controllers all seem to suddenly become unstable. WebUI would be unavailable randomly, pings would keep going though, sometimes you could only SSH to the switches, while ping and webui was unavailable.

Logs were full of errors that the APs (AP-65) were unable to download the new firmware (FTP timeouts). So the controllers really had no APs available to them. Oddly the two 2400s were fine, or seemed fine. APs upgraded and things were good.

I proceeded to reboot the 3600s a few times, finally they suddenly came up and were stable, things were good, APs were all reporting in. I looked at it further, and they went back to the old version of ArubaOS!

So my 2400s seemed ok on the new version, old version running on 3600s. My maintenance window was quickly coming to end, so I decided to leave it until I could get tech support involved.

Ends up the master 2400, was actually having some major issues and was bootstrapping the APs. I didn't notice at first because of all the other chaos. So users in my corporate office Monday morning started quickly complaining of no wireless being available in the building.

TAC support helped me downgrade the 2400 controller and at first, everything looked good. APs were up, users could log in, all was right with the world again.

However the engineer noticed STM was crashing on the 2400, at first he thought it was due to machine MAC authentication hitting the localdb (re-authentication was enable on the dot1x auth group). He turned off re-authentication, still didn't really help. STM keeps crashing, the GRE tunnels to the APs go down every 3 minutes.

TAC proposed the following:
1. Upgrade the ArubaOS on the controllers again. After my recent fun with the upgrade, i'm not crazy about that thought. However, it doesn't answer why it was fine before the upgrade, but has problems after the downgrade. We did no config changes, just firmware changes.

2. Move my APs off to another lcoal controller, the problem is around a issue with STM crashing on the master but if APs run on the local switches, the GRE tunnel stays up. I believe STM keeps crashing on my master but no one really notices, at least that's the thoughts I gathered from TAC.

So really, how in the world does my system go from working fine, to uncovering a bug in the firmware after a downgrade back to it? I must be missing something. Was wondering if anyone had this happen before (either with the upgrade or STM crashing). I've done many upgrades to Aruba in the past, never had these types of issues.

Any thoughts are appreciated! I've been using Aruba for 3 years, first time I've ever had a issue that didn't seem to be self-inflicted. In fact, it was the first time I had to call into TAC! I'm not totally happy with what they're proposing, I'm hoping for a simpler fix. Due to our site being heavy in manufacturing, Aruba is used by scanning devices and RFID. I have a small maintenance window once a month. If I crash out Aruba during the day at the manufacturing sites, we start losing money and well I don't want to be the guy that is doing that.

Due to all of this, I'm working on buying gear to make a test lab. However, until then I'm battling crappy wireless at my corporate office which isn't going over so well with everyone.

Sorry for the long post, just hoping to hear some good advice! Thanks!
MVP

Re: STM crashing on master controller

We've had STM crashes before, and recently actually. Likely unrelated, however.

When STM crashed, did you tar the crash logs and send them onto support? Generally, that'll contain a lot of useful information used to diagnose the problem. They can even run it against a DB to see if it matches any existing bugs. That would be a great first step.

tar crash
copy flash: crash.tar tftp:

A general rule of thumb for us is to not terminate APs on a master in a master-local cluster. So, option 2 doesn't sound too bad to me. If you have a standing maintenance window, you can try #1 for the sake of seeing if it remedies the problem.
==========
Ryan Holland, ACDX #1 ACMX #1
The Ohio State University
Search Airheads
cancel
Showing results for 
Search instead for 
Did you mean: