12-07-2014 03:15 PM
I've recently found any issue where my AD server (windows 2003) doesn't return a response to an ldap user search in some situations for 4-5 minutes. This is usually when the server is in the process of being restarted or shutdown for maintenance.
In this case a RADIUS timeout occured to our downstream devices and as such failed.
The issue here is that we had a backup AD server configured however it is never invoked for a large number of sessions and ClearPass seems to hang open until the server comes back onlien.
Eventually after these queries are run, subsequent authentication attempts seem to detect the server is down and then it fails over to the backup server.
I'm wondering whether there should be some kind of LDAP query / search tmeout parameter that expires an LDAP query after a certain amount of time and causes the session to failover to backup server (before RADIUS timeout period).
Anybody else see a problem here or had similar issues?
12-08-2014 02:08 AM
You will probably want to contact TAC to design this right. The type of redundancy you choose at this point will depend how your infrastructure is designed and your priorities. If you contact TAC with all your information they will advise you best. There is no one way to design this for everyone.
Aruba Customer Engineering
Looking for an Answer? Search the Community Knowledge Base Here: Community Knowledge Base
12-09-2014 07:05 PM
I spoke with TAC but they weren't really able to do much as we only had access tracker logs to go on (debug didn't take place as this was unplanned outage).
My concern more generally is that ClearPass can effectively wait an unlimited amount of time for AD to return search results and this causes downstream timeouts to RADIUS clients.
In my case a query took 3-4 minutes to return a result (presumably due to server being patched / restarted).
During this time ClearPass did not failover to secondary AD server.
The advice from TAC was that CPPM can ping the LDAP server then it is considered to be up. This doesn't seem to protect from higher level failures.