Finding reason of a Cluster member status(down), using Heartbeat counter.

MVP Expert
MVP Expert
Q:

How to find the reason for a Cluster member showing down, using heartbeat counter output?



A:

Executing "show lc-cluster heartbeat counters" lists heartbeat stats as well as few additional fields for all Cluster members. In the below output, we could notice the counter for CPDPD field is set to 1. This is because, the STM process was down for the corresponding Controller. Auth, STM, DDS, ISAKMPD are considered to be critical process along with CM(Client Match). If any of these processes restart, then the node is marked down by the other nodes immediately.

 

 (7220) #show lc-cluster heartbeat counters

Cluster Heartbeat Counters
--------------------------
IPv4 Address         RES      RSR   MIS  HMPD  LMRPD  IDPD CPDPD CDPD LMHINT                     LTOD
--------------- -------- -------- ----- -----  ----- ----- ----- ---- ------    ------------------------
    10.15.146.4        0        0     0     2     0     0     1     0      0    Wed Mar 20 12:19:46 2018
    10.15.146.5   855476   855476     0     2     0     0     0     0    276    Tue  Mar 19 12:31:20 2018
    10.15.146.6   855475   855475     0     2     0     0     0     0    275    Tue  Mar 19 12:31:20 2018

-----------PREAMBLE-----------------
RES    - REQ SENT
RSR    - RSP RCVD
MIS    - MISSES
HMPD   - HBT MISS PEER DEAD
LMRPD  - LINK MAP RCVD PEER DEAD
IDPD   - IPSEC DOWN PEER DEAD
CPDPD  - CRIT PROCESS DOWN PEER DEAD   
CDPD   - CLUSTER DISABLED PEER DEAD
LMHINT - LAST MISSED HBT INT (ms)
LTOD   - LAST TIME OF DISCONNECT
------------------------------------

 

  • We could see the same reason listed under lc-cluster membership statistics output.

 

(7037) #show lc-cluster group-membership

Cluster Enabled, Profile Name = "clusterZone1"
Redundancy Mode On
L2 Connected
Active Client Rebalance Threshold = 50%    
Standby Client Rebalance Threshold = 75%
Unbalance Threshold = 5%
Cluster Info Table
------------------
Type IPv4 Address    Priority Connection-Type STATUS
---- --------------- -------- --------------- ------
self     10.15.146.3      255             N/A CONNECTED (Leader)
peer     10.15.146.4      128             N/A DISCONNECTED (STM_MODULE_DOWN)   
peer     10.15.146.5      128    L2-Connected CONNECTED (Member, last HBT_RSP 78ms ago, RTD = 0.000 ms)
peer     10.15.146.6      128    L2-Connected CONNECTED (Member, last HBT_RSP 79ms ago, RTD = 0.521 ms)
Version history
Revision #:
2 of 2
Last update:
‎02-11-2019 12:05 PM
Updated by:
 
Labels (1)
Contributors
Search Airheads
cancel
Showing results for 
Search instead for 
Did you mean: