Security

last person joined: yesterday 

Forum to discuss Enterprise security using HPE Aruba Networking NAC solutions (ClearPass), Introspect, VIA, 360 Security Exchange, Extensions, and Policy Enforcement Firewall (PEF).
Expand all | Collapse all

Please share some "aaa authentication-server radius statistics" uptimes?

This thread has been viewed 6 times
  • 1.  Please share some "aaa authentication-server radius statistics" uptimes?

    Posted May 16, 2019 08:18 PM
      |   view attached

    Hi All,

    Back a couple weeks ago, when my two 7220 Mobilty Controllers and pair of Clearpass C3000V VMs had been talking happily for years, I never thought to run the command,

    "show aaa authentication-server radius statistics". Now I'm wondering what "good" values look like, for both uptime, and timeouts.

     

    This command gives lots of stats on each of your defined RADIUS servers, and the second column from the end of this lengthy output is "Uptime", in d:h:m .

     

    Since we've been having issues with 7220s connections to Clearpass timing out (we have a case open already) these times have been very short, generally in the minutes. I'm pretty sure this is Bad, but hoping for confirmation.

     

    If your Clearpass RADIUS auth is ticking along smoothly (the way ours used to be), can you please log in to the CLI of one of your controllers and let me know your Uptimes? I expect that they might be as long as back to your most recent Clearpass firmware update, and that the 3 to 20 minute values I'm typically seeing indicate a problem.

     

    Also, please check the "Tmout" column, and, if it is more than a few, perhaps compute it as a percentage of the "Raw Rq" number for that server. Since we started having these timeout problems, mine are up around 1% of requests.

     

    My understanding that the timeout column represents times when the 7220 got no answer at all from clearpass, rather than, say, indicating how many user devices timed out during their auth.

     

    I am thinking that Clearpass should very nearly _always_ reply to the mobility controllers, even if its response is "Reject, that user timed out during the auth"

     

    That is, if the "Tmout" count is times that Clearpass failed to answer, it should be very nearly zero, or at least, a tiny fraction of a percent of requests. 

     

    Converselly, I'd expect a count of users-timing-out-during-their-auth to be a fairly steady percentage of user auth requests.

     

    Just not totally sure which form of timeout this counter represents.

     

    Thanks,

    Steve

    Attachment(s)



  • 2.  RE: Please share some "aaa authentication-server radius statistics" uptimes?
    Best Answer

    EMPLOYEE
    Posted May 20, 2019 04:12 AM

    Uptime in the healthy working setup should be fairly high and would be around the last time clearpass /controller was rebooted (upgrade/ planned maintenance etc.). Seeing uptime of 3-20 minutes is not good. We can either check the access tracker logs or do packet captures to find where the timeout is happening.

     

    Low uptime can be caused by either clearpass not responding to radius requests within the timeout window or by response from clearpass being lost. First we need to isolate where the problem is. You can do a packet capture in clearpass and take a look at it or can look at the access tracker logs. Logs generally track when clearpass sends access-challenge and when it got a response back from NAD.

     

    Certain amount of timeouts are expected in a wireless network and hence i would not expect the number of timeouts in the aaa auth stats to be 0. It should be low compared to number of requests though.



  • 3.  RE: Please share some "aaa authentication-server radius statistics" uptimes?
    Best Answer

    EMPLOYEE
    Posted May 20, 2019 04:26 AM

    Looking at the output collected, seeing 2 and 4 minutes uptime doesnt look good. The timeout figures in the output refers to total timeouts caused either by clearpass not responding or by clients not responding. 1% timeout is normal in a typical wireless network and could be mostly clients not responding in time due to roaming etc.

     

    When clearpass server doesnt respond and request times out, the server is marked dead by controller and the uptime is set to 00:00:00



  • 4.  RE: Please share some "aaa authentication-server radius statistics" uptimes?

    Posted May 20, 2019 09:18 AM

    So, there is no differentiation between "client timeouts" and "clearpass server timeouts"? I expect some clients to walk away before they finish the auth, but I want Clearpass to answer essentially all the time, even if it just tells me that the client timed out.



  • 5.  RE: Please share some "aaa authentication-server radius statistics" uptimes?

    EMPLOYEE
    Posted May 20, 2019 09:37 PM

    Did some more research and i have to correct my earlier statement. The timeouts in radius statistics is only related to radius traffic between controller and RADIUS server. Hence client not responding will not cause the timeout counter to increase. Timeout counter increases when controller doesnt receive a radius response. Either ClearPass is not responding or packets from clearpass are getting dropped. 

     

    Depending upon the number of retries and timeout, clearpass would be marked as dead and uptime reset. Default retries and timeout is 3 and 5 sec respectively.

     

    If first two tries failed and third one succeeded then you will have two timeouts but server is not marked as dead. Hence if there is intermittent lack of response, you could see timeout counter increase without the server being marked down.



  • 6.  RE: Please share some "aaa authentication-server radius statistics" uptimes?

    Posted May 21, 2019 11:12 AM

    Actually, took a while for the light to dawn for me, but I realize these timeouts are related: If CPPM has a longer timeout for some user auth cases than the 5 second default that the mobility controller gives CPPM to reply, then CPPM may be waiting on a user when the MC times out and resends the transaction. (It may be that CPPM was about to reply saying that the user auth timed out, but the MC's 5-second window didn't give it a chance.

     

    Seems many of my failure cases are remote eduroam users. I beleive my CPPM time out for these users to Auth to their home servers is too long, and thus when their servers don't respond, then the MC times out, and aborts the transaction.

     

    SO, the MC marks it as a failure of CPPM to respond, and after three, will mark the CPPM server as down. But the problem is the MC is not giving CPPM time to report a timeout.