Wireless Access

last person joined: 21 hours ago 

Access network design for branch, remote, outdoor, and campus locations with HPE Aruba Networking access points and mobility controllers.
Expand all | Collapse all

AOS 8.3 Cluster no client balancing and crashes

This thread has been viewed 8 times
  • 1.  AOS 8.3 Cluster no client balancing and crashes

    Posted Oct 18, 2018 04:36 AM

    Hi all,

    i recently migrated our Setup to Mobility Master VA FW 8.3.

    The MM manages 2 Clusters. Each Cluster contains of 3 7220.

    Each Cluster holds 1500 AP and redundancy is enabled. Each Controller holds 500 AP.

    The Problem is on one cluster the Clients are load balanced and everything works as expected.

    On the other Cluster only one Controller holds 10k Clients.
    This results in Process /mswitch/bin/auth [pid 13420] died: got signal SIGSEGV and the Cluster is down.

    Can anyone point me to the Cluster Client balancing or what i' doing wrong?

     

    Best regards

    Jochen

     

    P.S. AOS8.3 is needed cause of our AP-345



  • 2.  RE: AOS 8.3 Cluster no client balancing and crashes

    Posted Oct 18, 2018 11:32 PM

    Process /mswitch/bin/auth [pid 13420] died: got signal

     

    auth crash is not a good thing, you should run tar crash and extract then send the crash.tar to Aruba support for analysis



  • 3.  RE: AOS 8.3 Cluster no client balancing and crashes

    EMPLOYEE
    Posted Oct 19, 2018 03:04 AM

    @JoBav wrote:

    Hi all,

    i recently migrated our Setup to Mobility Master VA FW 8.3.

    The MM manages 2 Clusters. Each Cluster contains of 3 7220.

    Each Cluster holds 1500 AP and redundancy is enabled. Each Controller holds 500 AP.

    The Problem is on one cluster the Clients are load balanced and everything works as expected.

    On the other Cluster only one Controller holds 10k Clients.
    This results in Process /mswitch/bin/auth [pid 13420] died: got signal SIGSEGV and the Cluster is down.

    Can anyone point me to the Cluster Client balancing or what i' doing wrong?

     

    Best regards

    Jochen

     

    P.S. AOS8.3 is needed cause of our AP-345


    You should be running 8.3.0.3 if you are not already to ensure you have all the fixes to date.  

     

    SSH into any controller in that cluster and type "

    show lc-cluster group-profile <name of cluster group>

    ...and paste in that output.



  • 4.  RE: AOS 8.3 Cluster no client balancing and crashes

    Posted Oct 19, 2018 03:17 AM

    As requested:

    show lc-cluster group-profile cluster-2

    IPv4 Cluster Members
    --------------------
    CONTROLLER-IP PRIORITY MCAST-VLAN VRRP-IP VRRP-VLAN GROUP-ID
    ------------- -------- ---------- ------- --------- --------
    192.178.11.162 128 0 0.0.0.0 0 0
    192.178.11.163 128 0 0.0.0.0 0 0
    192.178.11.164 128 0 0.0.0.0 0 0

    Redundancy:Yes

    Active Client Rebalance Threshold:50%

    Standby Client Rebalance Threshold:75%

    Unbalance Threshold:5%

    Active AP Load Balancing:YES

     

    In addition the AP and Client load distribution:

     

    show lc-cluster load distribution client

    Cluster Load Distribution for Clients
    -------------------------------------
    Type IPv4 Address Active Clients Standby Clients
    ---- --------------- -------------- ---------------
    peer 192.178.11.162 131 282
    self 192.178.11.163 2728 4120
    peer 192.178.11.164 4313 2676
    Total: Active Clients 7172 Standby Clients 7078

     

    show lc-cluster load distribution ap

    Cluster Load Distribution for APs
    ---------------------------------
    Type IPv4 Address Active APs Standby APs
    ---- --------------- -------------- ---------------
    peer 192.178.11.162 455 477
    self 192.178.11.163 460 474
    peer 192.178.11.164 479 443
    Total: Active APs 1394 Standby APs 1394

     

    Regards
    Jochen



  • 5.  RE: AOS 8.3 Cluster no client balancing and crashes

    EMPLOYEE
    Posted Oct 19, 2018 03:53 AM

    The controllers should be able to handle that load.

    The 7220 has a platform limit of 24000 devices, so it would actively start load balancing when a controller reaches 12000 devices or 50% (Active Client Rebalance Threshold), which none of your controllers has reached.  Meanwhile it should just randomly put clients on controllers.

     

    What ethernet/fiber connections do you have to each controller (gigabit, 10gigbit, etc)?  You should theoretically have 1 gigabit ethernet connection for each 100 access points so that traffic to/from the controller is not prevented.

     

    Your syslog message indicates that something else could be happening that you should have tac take a look at.  If your cluster is "failing" that would mean that heartbeats are not being exchanged regularly between cluster members (1 every 100 milliseconds - 3 or more misses could mean a failure).   That would indicate a connectivity issues between controllers.

     

    What is the output of:

     

    show lc-cluster heartbeat counters 

     

     

     

     



  • 6.  RE: AOS 8.3 Cluster no client balancing and crashes

    Posted Oct 19, 2018 04:03 AM

    The Controllers are connected with 10 gigabit to the router.
    As far as i can tell the problem occurs with around 11k clients connected to the controller.

     

    17:53:14 is the last Timestamp for auth SIGSEGV on the other Controller.

     

    show lc-cluster heartbeat counters

    Cluster Heartbeat Counters
    --------------------------
    IPv4 Address RES RSR MIS HMPD LMRPD IDPD CPDPD CDPD LMHINT LTOD
    --------------- -------- -------- ----- ----- ----- ----- ----- ---- ------ ------------------------
    192.178.11.162 575081 575081 0 0 0 0 1 0 0 Thu Oct 18 17:53:13 2018

    192.178.11.164 576806 576806 0 0 0 0 0 0 0

    -----------PREAMBLE-----------------
    RES - REQ SENT
    RSR - RSP RCVD
    MIS - MISSES
    HMPD - HBT MISS PEER DEAD
    LMRPD - LINK MAP RCVD PEER DEAD
    IDPD - IPSEC DOWN PEER DEAD
    CPDPD - CRIT PROCESS DOWN PEER DEAD
    CDPD - CLUSTER DISABLED PEER DEAD
    LMHINT - LAST MISSED HBT INT (ms)
    LTOD - LAST TIME OF DISCONNECT



  • 7.  RE: AOS 8.3 Cluster no client balancing and crashes

    Posted Oct 19, 2018 04:19 AM

    Hmm not sure about this but isn't the Platformlimit halved if one uses a Cluster?

    Currently close to a crash i wold say

     

    show lc-cluster load distribution client

    Cluster Load Distribution for Clients
    -------------------------------------
    Type IPv4 Address Active Clients Standby Clients
    ---- --------------- -------------- ---------------
    self 192.178.11.162 505 1906
    peer 192.178.11.163 9713 1167
    peer 192.178.11.164 1324 4473
    Total: Active Clients 11542 Standby Clients 7546



  • 8.  RE: AOS 8.3 Cluster no client balancing and crashes

    EMPLOYEE
    Posted Oct 19, 2018 04:36 AM

    The "platform limit" is halved from a redundancy standpoint due to the "standby" clients that would need to be serviced, but a controller should not be crashing because it is servicing half the client capacity.

     

    Your controllers missing heartbeats has a far greater effect on cluster stability than the number of clients.

     

     



  • 9.  RE: AOS 8.3 Cluster no client balancing and crashes

    EMPLOYEE
    Posted Oct 19, 2018 04:39 AM

    Again, please run this command on any of your controllers in the cluster to see the last time a controller disconnected and to see if you have any stability issues.

     

    show lc-cluster heartbeat counters



  • 10.  RE: AOS 8.3 Cluster no client balancing and crashes

    Posted Oct 19, 2018 06:28 AM

    Do you think it might help if i set the Active Client Rebalance Threshold to something like 20% to help with better client balancing?

     

    No dropped heartbeat since last  cluster hickup.

     

    show lc-cluster heartbeat counters

    Cluster Heartbeat Counters
    --------------------------
    IPv4 Address RES RSR MIS HMPD LMRPD IDPD CPDPD CDPD LMHINT LTOD
    --------------- -------- -------- ----- ----- ----- ----- ----- ---- ------ ------------------------
    192.178.11.162 42948 42948 0 0 0 0 0 0 0
    192.177.11.164 46919 46919 0 0 0 0 0 0 0

     



  • 11.  RE: AOS 8.3 Cluster no client balancing and crashes

    Posted Oct 19, 2018 07:40 AM

    Hi,

     

    probably it's not your issue but, are three controllers L2 connected? It's a common issue to deploy a vlan and not trunk it to all controllers, even though the vlan has no service in it.

     

    To check it type "show lc-cluster vlan-probe status". If it says L3 connected then a vlan is not properly tagged all the way to all controllers. If you want to exclude the vlan-probe on a specific vlan "lc-cluster exclude-vlan x"

     

    Hope this helps a bit. It's very odd that the users are not even a bit balanced on three controllers, it should do a hash of the MAC client and BSSID and then select a bucket. It's not a perfect balance but the proportion should be a lot better than what I'm seeing on your post.

     

    To see to what controllers are the clients being anchored to (A-UAC and S-UAC) type "show aaa cluster essid-all users".



  • 12.  RE: AOS 8.3 Cluster no client balancing and crashes

    EMPLOYEE
    Posted Oct 19, 2018 09:21 AM

    @JoBav wrote:

    Do you think it might help if i set the Active Client Rebalance Threshold to something like 20% to help with better client balancing?

     

    No dropped heartbeat since last  cluster hickup.

     

    show lc-cluster heartbeat counters

    Cluster Heartbeat Counters
    --------------------------
    IPv4 Address RES RSR MIS HMPD LMRPD IDPD CPDPD CDPD LMHINT LTOD
    --------------- -------- -------- ----- ----- ----- ----- ----- ---- ------ ------------------------
    192.178.11.162 42948 42948 0 0 0 0 0 0 0
    192.177.11.164 46919 46919 0 0 0 0 0 0 0

     


    You could do that, but it would only help with the client numbers which is mostly cosmetic.  If you SSH into any of your controllers and there is an * after the hostname, you have a crash and you need to contact TAC to have that looked at.  If there is no * and you are not on ArubaOS 8.3.0.3, you should strongly consider upgrading to 8.3.0.3.  Either way, you should have contact TAC to look at your controller, because there is a limit to what you can post here.  TAC will determine if something is wrong.  Having an unbalanced user count is not a bad thing and it is something that gradually corrects itself.

     

    You can contact TAC here:  https://www.arubanetworks.com/support-services/contact-support/



  • 13.  RE: AOS 8.3 Cluster no client balancing and crashes

    Posted Oct 19, 2018 07:56 AM

    Hi fefa,

    it took us quite some time to get the Cluster L2 connected. The VLAN exclusion list is 256 character max lenght. With about 130 bridged SSIDs in special VLANs we needed to configure large VLAN blocks to exclude them and get L2 connection.



  • 14.  RE: AOS 8.3 Cluster no client balancing and crashes

    Posted Oct 19, 2018 08:00 AM

    If you type "show ap remote debug bucketmap stm ap-name xxx" on the controller, are all the "L2 Connectedness" set to 1 on all BSSIDs?



  • 15.  RE: AOS 8.3 Cluster no client balancing and crashes

    Posted Oct 19, 2018 08:11 AM

    Yes all L2 Connectedness have 1 on all BSSIDs.



  • 16.  RE: AOS 8.3 Cluster no client balancing and crashes

    Posted Oct 19, 2018 10:32 AM

    Thank you very mutch for your help.

    I've opened a case and will share the solution if i can.

    With Client numbers decreasing im expecting the next intresting time on monday :)

     



  • 17.  RE: AOS 8.3 Cluster no client balancing and crashes

    Posted Oct 24, 2018 05:08 AM

    Just a small update..

    The crashes occure as soon as there is a second (or third) cluster and one of the Controllers hit around 10k Clients.

    Workaround is to have the the controlles running without a cluster.

    So far this works but missing redundancy, client balancing, fast failover etc.



  • 18.  RE: AOS 8.3 Cluster no client balancing and crashes

    EMPLOYEE
    Posted Oct 24, 2018 05:12 AM

    Thank you for opening a case with TAC.  Did they say that they have seen the crashes before or is this new?



  • 19.  RE: AOS 8.3 Cluster no client balancing and crashes

    Posted Oct 24, 2018 06:05 AM

    This seems to be new.
    A upgrade to 8.3.0.3 didn't help and they got more than 500MB of log and crash report.



  • 20.  RE: AOS 8.3 Cluster no client balancing and crashes

    Posted Nov 13, 2018 12:13 PM

    Maybe last update... for now.
    We were unable to fix this issue.
    We downgraded back to stable Firwmare 6.5.4.9

    Might try firmware 8.4.x with x greater 0 next year.