08-24-2015 01:09 PM
Has any one run into this error before? Authentication is occuring against Active Directory.
2015-08-24 14:31:39,253 126.96.36.199 CPPM_Alert 313323 1 0 TimestampFormat=yyyy-MM-dd HH:mm:ss,S,timestamp=2015-08-24 14:29:53-04,session_id=<Session ID>,alert=<AD Connection name>: No free connections available.\n[Local User Repository] - localhost: User not found.\nMSCHAP: Authentication failed\nEAP-MSCHAPv2: User authentication failure,src=<ClearPass IP>,service_name=RADIUS
There are no errors on the AD server that correlate to this event.
08-26-2015 02:10 AM
Is this recurring issue or you found this only once ? Did you try with aaa test server ?
Please feel free for any further help on this.
[Is my post helped you ? Give Kudos :) ]
08-26-2015 04:07 AM
It happens frequently, since deployment. We have been increasing all of the max connection settings, but the errors keep coming. The settings can only be increased so much before we max them all out. Sometimes only a couple of auths are affected, other times it is hundreds of them. The worst case was a fifteen minute period where every user auth was rejected, 14000 of these errors. This issue has appeared on all three of our produciton servers. The test server has never had this event, but there is a huge difference between one or two auths an hour and 1400 auths a minute.
08-28-2015 12:20 AM
Please open a TAC case to troubleshoot this.
From my interpretation of the logs, I would suspect the (Kerberos) connections between ClearPass and your AD servers. It seems like there are more authentication requests coming in than your AD can handle; please also check the load on the AD servers during these issues. ClearPass limits the number of concurrent authenticatations to backends like AD to prevent AD from locking up.
What you probably will see is that during these problems, the transaction times rapidly increase. You can check that in the ClearPass Dashboard in the Request Processing Time widget; and in more detail in Monitoring -> Live Monitor -> System Monitor -> ClearPass (tab). There you can see that transaction times for all parts of the authentication, so you can find where the delay is.
It might also be a connection problem between ClearPass and the AD servers (spanning tree? flapping switch ports?), or a Denial of Service by a poorly configured client (in that case you would see high rate failed authentications from a single source or username). 14,000 failures in 15 minutes might be overloading your AD if you haven't scaled it for that load. You can also manually set the login servers for MSCHAPv2 to steer your ClearPass to specific high capacity or nearby servers to prevent ClearPass from using a remote AD server for authentication:
|Configure an (optional) restricted list of domain controllers to be used for MSCHAPv2 authentication. If not specified, all available domain controllers obtained from DNS will be used for authentications.|
TAC can help you with these troubleshooting tasks..
If you have urgent issues, please contact your Aruba partner or Aruba TAC.
08-28-2015 08:16 AM
Thanks for the detailed reponse. We are pretty sure the requests are not even getting off the ClearPass servers during these events. There is not record of the attempts in the AD audit logs. The transaction time tracking would be really helpful, but it only shows that last 30 minutes. Is this info somewhere in the ClearpLogs? We are sending them off to Splunk via syslog. I have been looking at the logs for devices authentications in Splunk, but I can only see time stamps for when the various events occured, there don't seem to be any logs indicating the times to complete any of the various parts of authentication. Speaking of authentication, that whole process seemd odd. Is this the correct process:
- Try to auth the user/computer against Microsoft Directory Services (port 445)
- Verify the user/computer account using the AD\aruba user over LDAP to AD (port 636)
Being able to tell which of those two is acctually failing might help use find this source of this issue
We do have a couple dozen devices that are authing (successful and failures) 500 to 2000 times an hour. Those are almost all MAC auths, so AD should not be invloved in those transactions, but I could see them crowding out other auth methods. We are trying to hunt these things down and fix or remove them. My favorite are the old LaserJets that freak out and start with mac 00:00:00:00:00:00 and increment by one util they get on the network.
We have engaged support on the "No free connections available" issue.
05-26-2016 12:04 PM
The issue was related to referal checking be set to ON for the Active Directory authentication. This resulted in a massive increase in the connections required to authenticate AD user and this would max out the number of threads allowed in the ClearPass device. We disabled the referals and the issues went away.