Wireless Access

Reply
Highlighted
Contributor II

Re: Source of RADIUS timeouts?

Sorry to bump this up, but I have been looking at a similar, ney, identical problem. I have a TAC case (#1440376) and am still waiting for a period when I see consecutive drops and can capture it with Wireshark. I also have security debugging enabled. My issue is that it is only affecting two radius servers for one auth group when other servers have been up as long as the controller was last rebooted.

 

I too can find no network issues and the server team report nothing amiss with the radius server (Tokyo to Hong Kong), the MPLS network is fine, it never drops a single packet and sits at 50ms response time....always.

 

My only workaround is to to set "Auth Server dead time" to 0, then all users are happy as they are not being bumped through to a radius server that does not authenticate them.

 

This is really very frustrating and impossible to locate why. My only working theory is that as the LAN is sending out hundreds of machine auth requests, which won't work as machine auth is not used, that the server is blacklisting the NAS IP and controller.

 

I am still hunting for an answer.

Highlighted
Guru Elite

Re: Source of RADIUS timeouts?

I would try upgrading to 6.1.3.9 or 6.2.1.2.  In the release notes, there is a fixed bug, 76484:

 

Symptom: RADIUS authentication failed in networks that had different Maximum Transmission Values (MTUs). To fix this issue, the socket options are updated to allow the controller to send RADIUS requests to the RADIUS server when EAP termination is enabled.
Scenario: The RADIUS authentication failed when the MTU value in the network between the controller and RADIUS server was different. This issue was observed in controllers running ArubaOS 6.1.3.x


*Answers and views expressed by me on this forum are my own and not necessarily the position of Aruba Networks or Hewlett Packard Enterprise.*
ArubaOS 8.5 User Guide
InstantOS 8.5 User Guide
Airheads Knowledgebase
Airheads Learning Videos
Remote Access Point Solution Guide
ArubaOS Consolidated Release Notes
ArubaOS 8 ViA VPN Solution Guide
Highlighted
Contributor II

Re: Source of RADIUS timeouts?

Thanks for the reply. Currently it is running: ArubaOS (MODEL: Aruba3400), Version 6.1.2.7

Highlighted
Guru Elite

Re: Source of RADIUS timeouts?

You might want to open a case and reference that bug.  Not sure if that specific issue exists on your version of code.

 


*Answers and views expressed by me on this forum are my own and not necessarily the position of Aruba Networks or Hewlett Packard Enterprise.*
ArubaOS 8.5 User Guide
InstantOS 8.5 User Guide
Airheads Knowledgebase
Airheads Learning Videos
Remote Access Point Solution Guide
ArubaOS Consolidated Release Notes
ArubaOS 8 ViA VPN Solution Guide
Highlighted
Contributor II

Re: Source of RADIUS timeouts?

Thanks, I informed the TAC regarding the above.

Highlighted
Contributor II

Re: Source of RADIUS timeouts?

I have just checked all the APAC local controllers and all have the same problem. The only controller that does not is the standby controller in Tokyo, but this sends no traffic. Even the local controller in Hong Kong where the radius server is located has the same problem, but the Hong Kong Master, which is the region Master does not. Sites include:

 

Tokyo

Hong Kong – Same site as Radius

Manila

Mumbai

Sydney

Taiwan

 

Seems like a bug to me.

Highlighted
Guru Elite

Re: Source of RADIUS timeouts?

Well, let us see what support says.  Since you have the problem, any information that you can give them to help them determine what is going on will help you get to the bottom of this.


*Answers and views expressed by me on this forum are my own and not necessarily the position of Aruba Networks or Hewlett Packard Enterprise.*
ArubaOS 8.5 User Guide
InstantOS 8.5 User Guide
Airheads Knowledgebase
Airheads Learning Videos
Remote Access Point Solution Guide
ArubaOS Consolidated Release Notes
ArubaOS 8 ViA VPN Solution Guide
Highlighted
Contributor II

Re: Source of RADIUS timeouts?

The issue was due to the first radius frame being sent from the controller with the df bit set, so the frame was dropped and the controller would still wait for a reply. None was received, so it would then mark the server as down for 10 mins, which I think is the default dead timer. The workaround was to set the auth dead time to 0 -  Case # 1440376.

 

Since we have upgraded to 6.3.1.2, I now see "AvgRspTm" from a local on the same LAN as the radius server running at "-844826803" - I don't understand this. I am getting in touch with Aruba Support again and trying to raise a cross referenced case. I have been told this morning that the radius server CPU was running very high, that is being looked into now.

Highlighted
Contributor II

Re: Source of RADIUS timeouts?

Regarding the AVgRspTme in negative values, we already have a bug raised with the engineering team. Bug# 89169.

 

A patch request has been raised for the issue to be addressed in 6.3 stream

 

<tac engineer>@arubanetworks.com

Highlighted
Contributor II

Re: Source of RADIUS timeouts?

I'm going to add some more information to this thread.

 

We have also been seeing Radius server uptime issues on the ArubaOS cli. For all of our local controllers in EMEA, the uptime will keep resetting. It is now very likely that this is a client configuration issue. The default TTL on each request in ArubaOS is 3x10. If after the third attempt, no response from the Radius server (MS NPS, 2008) is received, the controller marks the server as down, default 10mins. This is regardless of whether there is other radius traffic passing, any one single 3x10 failure will mark the server down. Additionally, if the same problem is seen on the fail through server, the primary server is brought back in to service. Consequently when troubleshooting, the auth debug output can be full of:

 

 

Apr  4 13:42:49  authmgr[1710]: <124004> <DBUG> |authmgr|  Auth server <servername>' response=2

Apr  4 13:42:49  authmgr[1710]: <124014> <NOTI> |authmgr|  Taking Server <servername> out of service for 10 mins

 

Apr  4 13:42:51  authmgr[1710]: <124004> <DBUG> |authmgr|   server=<servername>, ena=1, ins=0 (1)

Apr  4 13:43:00  authmgr[1710]: <124004> <DBUG> |authmgr|   server=<servername>, ena=1, ins=0 (1)

/snip

 

Then:

 

Apr  4 13:43:24  authmgr[1710]: <124015> <NOTI> |authmgr|  Bringing Server <servername> back in service.

 

It seems the cause of this is clients using EAP instead of PEAP.

 

What we see on the Radius logs are:

 

Authentication Details:

                Connection Request Policy Name:           1-Secure Wireless Connections Aruba

                Network Policy Name:                   Secure Wireless Connections Aruba London

                Authentication Provider:                              Windows

                Authentication Server:                 <servername>

                Authentication Type:                     EAP

                EAP Type:                                            -

                Account Session Identifier:                          -

                Reason Code:                                    1

                Reason: An internal error occurred. Check the system event log for additional information.

                                     

A successful request receives:

 

Authentication Details:

            Connection Request Policy Name:          1-Secure Wireless Connections Aruba

            Network Policy Name:                     Secure Wireless Connections Aruba London

            Authentication Provider:                 Windows

            Authentication Server:                    <servername>

            Authentication Type:                       PEAP

            EAP Type:                             Microsoft: Secured password (EAP-MSCHAP v2)

            Account Session Identifier:                        -

 

Quarantine Information:

            Result:                                               Full Access

            Extended-Result:                             -

            Session Identifier:                            -

            Help URL:                             -

            System Health Validator Result(s):         

 

We can see this in Wireshark with constant failures to reply to the controller from the Radius box. Increasing the retransmits will not solve this, the server will still not reply.

 

Moral of the story here - Get a Wireshark capture and press server admins to scrutinise the server logs.

View solution in original post

Search Airheads
cancel
Showing results for 
Search instead for 
Did you mean: