Controllerless Networks

Reply
Occasional Contributor II
Posts: 25
Registered: ‎07-02-2014

Big problems w/ 115/225 Cluster

For about half a year now, I've had (2) IAPs in a clsuter running stably in a multi-family home. Previously, I had broadcast filtering set to disabled. Wanting to take advantage of best practices, I enabled broadcast filtering and turned on AirGroup. The cluster was running smoothly for the first day or so until I ran into problems.

 

The cluster is completely unstable in this configuration when I'm running the master controller on the IAP115. Every few hours, CPU usage spikes to 100% and stays there causing pings and packet loss everywhere. This persists until it's reboot.

 

If I move the master controller from the IAP115 to the 225, I see much more stability. Though occasionally stability will fall off the cliff (albeit without the CPU spikes I was seeing on the 115) and the cluster again needs to be reboot.

 

After opening a case with TAC, they advised that I upgrade to yesterday's firmware (6.4.2.3-4.1.1.2_48114). So far the cluster has been stable on the 225 (I haven't tried the 115 yet) but the newest firmware presents a huge bug:

 

112117 Symptom: When the 80 MHz support is enabled on an IAP, ARM chooses only 36E as a valid
channel.
Scenario: This issue occurrs when ARM is enabled on an IAP to allocate 80 MHz channels. This issue
is observed in the IAP-22x and IAP-27x devices running 6.4.2.3-4.1.1.2 release.
Woraround: None

 

It seems I'm caught between an unstable cluster and limited 80Mhz channel availability.

 

Can anyone from Aruba shed some light on this or when I might expect to see a fix? Does anyone have any other suggestions for this type of setup?


This doesn't even address the fact that I would like to run the cluster on the IAP115 so that I can test reboots/configuration changes safely on the 225. I feel like the firmware has gone from bad to worse (still waiting on mesh support).

 

 

Occasional Contributor II
Posts: 25
Registered: ‎07-02-2014

Re: Big problems w/ 115/225 Cluster

I should also add that none of the 80Mhz E channels show up in ARM under the new firmware. Aruba, please fix this!

 

Here's at least hoping that the new firmware brings some cluster stability.

Occasional Contributor II
Posts: 25
Registered: ‎07-02-2014

Re: Big problems w/ 115/225 Cluster

Question 1) Does anyone know what busybox is? It might be causing the issue.

 

Also, TAC seems useless on this. Third phone call in three days.

 

CPU and Memory Usage
--------------------
Timestamp CPU Util(%) Memory Util(%)
--------- ----------- --------------
2015-01-20 21:48:02 57 37
2015-01-20 21:47:52 35 37
2015-01-20 21:47:42 20 38
2015-01-20 21:47:32 70 38
2015-01-20 21:47:12 100 38
2015-01-20 21:46:35 100 38
2015-01-20 21:45:52 99 38

Peak CPU Util in the last one hour
----------------------------------
Timestamp CPU Util(%) Memory Util(%)
--------- ----------- --------------
2015-01-20 21:36:18 100 37

 

Output of top
-------------
Mem: 95076K used, 160860K free, 0K shrd, 0K buff, 28348K cached
Load average: 4.91 8.13 6.39 (Status: S=sleeping R=running, W=waiting)
PID USER STATUS RSS PPID %CPU %MEM COMMAND
16976 root R N 368 16975 30.5 0.1 busybox
2 root RWN 0 1 7.6 0.0 ksoftirqd/0
1738 root S < 13732 1671 0.0 5.3 cli
1748 root S N 5096 1671 0.0 1.9 sapd
1761 root S 2648 1671 0.0 1.0 mdns
1752 root S < 2508 1671 0.0 0.9 stm
1758 root S 2340 1671 0.0 0.9 snmpd_sap
16518 root S 1876 1671 0.0 0.7 radiusd-term
16517 root S 1868 1671 0.0 0.7 radiusd
1737 root S N 1688 1671 0.0 0.6 awc
1767 root S 1472 1671 0.0 0.5 meshd
1764 root S 1316 1671 0.0 0.5 lldpd
1680 root S 1196 1671 0.0 0.4 tinyproxy
1671 root S 1116 1 0.0 0.4 nanny
16966 root S < 1108 1579 0.0 0.4 mini_httpd
1765 root S 1020 1671 0.0 0.3 rfd
1689 root S 976 1680 0.0 0.3 tinyproxy
1692 root S 976 1680 0.0 0.3 tinyproxy
1690 root S 976 1680 0.0 0.3 tinyproxy
1691 root S 976 1680 0.0 0.3 tinyproxy
1576 root S < 736 1 0.0 0.2 mini_httpd

Guru Elite
Posts: 8,759
Registered: ‎09-08-2010

Re: Big problems w/ 115/225 Cluster

Busybox is part of the Linux backend. Did you try escalating your case? 


Thanks, 
Tim

Tim Cappalli | Aruba Security TME
@timcappalli | timcappalli.me | ACMX #367 / ACCX #480
Guru Elite
Posts: 21,499
Registered: ‎03-29-2007

Re: Big problems w/ 115/225 Cluster

Ciscokid85,

 

Please message me the case number.



Colin Joseph
Aruba Customer Engineering

Looking for an Answer? Search the Community Knowledge Base Here: Community Knowledge Base

Occasional Contributor II
Posts: 25
Registered: ‎07-02-2014

Re: Big problems w/ 115/225 Cluster

I've escalated the case with TAC. The third rep was more detailed and promised that he would look into the issue.

 

I've also added much more robust monitoring so that i should be emailed immediately should this occur again.

 

I'm also attaching the AP Tech Support Dump here for others to review. Just a note, the log was grabbed 2-3mins after the CPU spike. I couldn't grab it at the time of the event because the portal stopped responding and returned an internal server error.

 

Thanks for the assist!

New Contributor
Posts: 2
Registered: ‎02-06-2015

Re: Big problems w/ 115/225 Cluster

I am also seeing some instability with 4.1.1.2. The elected controller spikes to 100% CPU and loses network connectivity. A power cycle temporarily fixes it but the next AP to be elected at the controller does the same.

 

Are there known issues with 4.1.1.2?

Occasional Contributor II
Posts: 15
Registered: ‎08-11-2014

Re: Big problems w/ 115/225 Cluster

I have been informed by TAC that this is indeed a bug (a huge one at that). A bug report has been submitted internally and they are investigating the issue. I'm trying to pull a tech support dump from the console when I experience the issue but I haven't yet been able to access the device phyiscally when hte issue occurrs so it's been difficult.

 

The other strange thing is that it appears to be the same bug affecting the IAP115 and the IAP225. If it happens on the 225, I need to reboot all IAPs. If it happens on the 115, the whole network drops due to a packet storm. >_<

Spoiler
 
MVP
Posts: 314
Registered: ‎04-03-2014

Re: Big problems w/ 115/225 Cluster

This sounds serious. Do you have a bug/defect ID? Have you heard of any ETA on the fix?

Christoffer Jacobsson | Aranya AB
Aruba: ACMX #537 ACCP | CWNP: CWNA CWDP CWSP
Occasional Contributor II
Posts: 14
Registered: ‎05-07-2013

Re: Big problems w/ 115/225 Cluster


ComplexMind wrote:

I am also seeing some instability with 4.1.1.2. The elected controller spikes to 100% CPU and loses network connectivity. A power cycle temporarily fixes it but the next AP to be elected at the controller does the same.

 

Are there known issues with 4.1.1.2?


 

I'm also seeing this issue with a mixed cluster of IAP-105s and IAP-115s.  The elected controller becomes completely unresponsive -- fails to respond even to pings when it's in this state, although it continued to send logging info to my syslog host for a few minutes after the failure, consisting mostly of ~100 of these per second (where I've replaced the controller's IP address with xxx.xxx.xxx.xxx, and the controller's MAC address with AA:AA:AA:AA:AA:AA):

 

Feb 23 15:11:08 2015 xxx.xxx.xxx.xxx <xxx.xxx.xxx.xxx AA <Error>: AA:AA:AA:AA:AA> stm[1537]: PAPI_Send: sendto ARM Process failed: No such file or directory Message Code 3115 Sequence Num is 27866 

Feb 23 15:11:08 2015 xxx.xxx.xxx.xxx stm[1537] <Error>: <304065> <ERRS> <xxx.xxx.xxx.xxx AA:AA:AA:AA:AA:AA>  PAPI_Send failed, send_papi_message_with_args, 790: No such file or directory, dstport 8494

 

The scary part about this is that the network is still basically functional -- clients can connect to other APs and it passes network traffic, but nothing connects to the elected controller and I have no access to the admin UI.

 

I also just noticed that the controller seems to be spamming ARP requests for various hosts on the network at an unusual rate while it's in this state.  I'm not sure what to make of this.

Search Airheads
Showing results for 
Search instead for 
Did you mean: