Security

Reply
Highlighted
Occasional Contributor I

CPPM authentication TIMEOUTS after switch reboots

We are experiencing, across multiple client environments (though not all) a staggering number of authentication timeouts in ClearPass, with an access tracker alert message of "Client did not complete EAP transaction."  The customers experiencing the issue have similar architectures, with only slight variations in equipment (but not all customers with that architecture have the issue).

 

Customer WAN Description:

Each customer has multiple sites, and each site has a campus core handling the L3 traffic, these cores are HP5400 or 5400R series switches.  They are connected back to a central core using dual 10 Gb Fiber optics links and OSPF as the routing protocol.  The central core is a VSF pair of 5212R’s.  Each site has one 10Gb connection to each member of the core. 

 

Customer Site description:

The access layer switching at these sites consists of 2530’s, 2920’s and 2930’s.  Some sites have a mix of 2530’s and 2930’s, some are all 2530’s and some are all 2930’s.  In all cases, these switches (with very few exceptions) connect via fiber back to the campus core.  Where the connection can be 10Gb, it is.  These switches are configured with the below AAA commands to support MAC or supplicant authentication (obviously port assignment may differ)

 

aaa authentication login privilege-mode

aaa authentication web login radius local

aaa authentication web enable radius local

aaa authentication ssh login peap-mschapv2 local

aaa authentication ssh enable peap-mschapv2 local

aaa authentication port-access eap-radius

aaa authentication mac-based peap-mschapv2

aaa port-access authenticator 1-24

aaa port-access authenticator 1-24 logoff-period 31536000

aaa port-access authenticator 1-24 client-limit 2

aaa port-access authenticator active

aaa port-access mac-based 1-24

aaa port-access mac-based 1-24 addr-limit 128

aaa port-access mac-based 1-24 addr-moves

aaa port-access mac-based 1-24 logoff-period 72000

 

Authentication details:

ClearPass is the RADIUS server in these cases, and AD is our authentication source.  The services are configured for EAP-PEAP (with MSCHAPv2) as authentication method (Keeping in mind that the HPE/Aruba switches act as the authenticator and supplicant for devices doing MAC-Auth.  AD is leveraged for devices without supplicants by the addition of a user account, which is the device MAC address for both user name and password (yes, we have secured these AD accounts and prevented them from accessing other Domain objects, and limited them via GPO and other attributes and features such as “Log on to” and FGPP’s).  Windows clients are generally configured with their own supplicant, and that configuration is pushed out via GPO.

 

CPPM:

Two CPPMS Configured as a cluster and L2 connected

Versions currently in use are 6.8.3, and 6.7.9.

 

Switch firmware:

We have a number of switch firmware’s in use, from 16.05.03 up through 16.09.03 – behavior is the same across all – across all models.

 

Detailed problem description:

The CPPM TIMEOUT issue only presents itself after a switch has been rebooted, and seems to last anywhere from 5 – 10 minutes.  During this time, some devices will authenticate immediately, and others will retry over and over for up to ten minutes, and eventually authenticate.  There seems to be no pattern with types of devices – it could be any of them, Windows PC’s, phones, intercom equipment, cameras etc… (note devices with their own supplicant seem to be unaffected).  During this time, the switch log shows the ports (connected to the affected devices) going on and off-line, commensurate with the TIMEOUTs in CPPM.  An example of these events (from a 2530) is below Note: if this was an excerpt from a 2930, there would be an extra entry specifying the port was “Blocked by AAA” for each off-line instance:

 

I 01/29/20 21:46:39 00076 ports: port 47 is now on-line

I 01/29/20 21:46:38 00076 ports: port 33 is now on-line

I 01/29/20 21:46:38 00076 ports: port 31 is now on-line

I 01/29/20 21:46:37 00076 ports: port 18 is now on-line

I 01/29/20 21:46:35 00077 ports: port 47 is now off-line

I 01/29/20 21:46:35 00077 ports: port 31 is now off-line

I 01/29/20 21:46:34 00077 ports: port 33 is now off-line

I 01/29/20 21:46:34 00077 ports: port 18 is now off-line

I 01/29/20 21:44:39 00076 ports: port 47 is now on-line

I 01/29/20 21:44:38 00076 ports: port 33 is now on-line

I 01/29/20 21:44:38 00076 ports: port 31 is now on-line

I 01/29/20 21:44:38 00076 ports: port 18 is now on-line

I 01/29/20 21:44:36 00077 ports: port 47 is now off-line

I 01/29/20 21:44:36 00077 ports: port 33 is now off-line

I 01/29/20 21:44:35 00077 ports: port 31 is now off-line

I 01/29/20 21:44:35 00077 ports: port 18 is now off-line

I 01/29/20 21:44:21 00076 ports: port 35 is now on-line

I 01/29/20 21:44:20 00076 ports: port 46 is now on-line

I 01/29/20 21:44:19 00076 ports: port 27 is now on-line

I 01/29/20 21:44:19 00076 ports: port 12 is now on-line

 

Once this time period has passed (again anywhere from 5-10 minutes depending on the switch) the symptom does not recur until the next reboot.  Even if every cable is pulled from the switch and reconnected, there will be no more CPPM TIMEOUTS until reboot.

 

Access tracker logs from the TIMEOUT events, indicate that the TLS Tunnel has been set up, but that CPPM is not receiving all the Access Responses to its requests. See attached screen shot “AT_Log_Post_200130.jpg”.

 

 

We have tried:

  • Changing the EAP-TLS Fragment Size to minimum of 512 bytes
  • Applying the following command to the switches to ignore non sequential EAP messages: aaa port-access authenticator eap-id-compliance
  • Adjusting retries and wait periods on the switch
  • Ensuring that our time source is good on both switch and CPPM, and that they’re sync’d.

 

TAC case:

We’ve have been working with TAC for a month and half now, the case has been escalated.  There are two escalated cases in fact, one for CPPM and one for the switching team.  Currently the switching team is working to build out an OSPF environment that mirrors the one we’re working with and has copies of the configs to test with. 

 

So far nothing that I’ve read or tried seems to be applicable. 

 

Also of note:

All of these clients previously used Microsoft NPS without issue.  There were no changes to the switch config other than to point to CPPM instead of NPS.  Re-directing them to NPS (for those clients who didn’t strip it out) resolves the issue immediately. 

 

Wireless 802.1x with an Aruba AOS 8 controller architecture is in place as well, and works without issue.

 

And of course, more packet captures to come, but so we’ve only been able to get one capture from the switching side due to the environments we’re working with. 

 

Any suggestions welcome.


Accepted Solutions
Highlighted
Occasional Contributor I

Re: CPPM authentication TIMEOUTS after switch reboots

The issue has been resolved.  The Aruba switch engineering team determined it to be a bug.  The fix should be included with firmware version 16.09.0010 released on the 3rd of this month.  No release notes yet, but when they're available should reference "CR 251972" in the list of fixes.

View solution in original post


All Replies
Highlighted
Occasional Contributor I

Re: CPPM authentication TIMEOUTS after switch reboots

Here is the referenced .jpg screen shot

Highlighted
Occasional Contributor I

Re: CPPM authentication TIMEOUTS after switch reboots

The issue has been resolved.  The Aruba switch engineering team determined it to be a bug.  The fix should be included with firmware version 16.09.0010 released on the 3rd of this month.  No release notes yet, but when they're available should reference "CR 251972" in the list of fixes.

View solution in original post

Highlighted
MVP Expert
MVP Expert

Re: CPPM authentication TIMEOUTS after switch reboots

Thanks for sharing, i see some similair issues...

 

My situation:

When rebooting the coreswitch all alcatel voip phones on the 2930M switches shows eap-time outs in ClearPass until i powered the phones off/on.

Kind Regards Marcel Koedijk
HPE ASE Flexnetwork | ACMP | ACCP | Ekahau ECSE Design - Was this post usefull, Kudos are welcome.
Highlighted
Occasional Contributor I

Re: CPPM authentication TIMEOUTS after switch reboots

We've seen a number of devices that behave similarly.  Predominantly we encounter this with the intercom speaker/clock devices that many of our customers have.  For those it's one and done, if they don't authenticate right away they give up and go quiet, requiring a reboot after the switch has been up for about 10 minutes (sometimes disabling the interface is enough).

 

The bug fix resolves an issue where the switches abandon one authentication attempt, and start another - over and over again.  It is different per switch, but seems to last between 5 and 10 minutes. This results in multiple Timeouts on ClearPass.  The switch eventually stops this behavior, waits long enough to finish an exchange and then authenticates the client. 

 

We've only experienced this behavior with 2930's and 2530's.  We've had no problems with 2920's or 3810's and are only now beginning to deploy 802.1x with OS-CX.

Search Airheads
cancel
Showing results for 
Search instead for 
Did you mean: