Radius Server Dead Timer Not Working As Expected In The Switch
Radius server dead timer is applied when the server goes offline /not reachable. In such scenarios you have multiple options that can be configured in the switch to over come clients getting unauthenticated and getting blocked by AAA.
Some of them include having critical VLAN and critical user role configured or having authorized access granted when radius server is not reachable.
When the second option is applied the clients are authenticated for 60mins(default radius server dead timer) and no radius packets will be forwarded to the server as it would be marked dead for 60mins or radius dead timer.
In the below scenario with windows 7 client the switch tries to reach the server before the dead timer expires.
Port access configuration-
aaa authentication port-access eap-radius server-group "ISE_access" authorized
aaa server-group radius "ISE_access" host 220.127.116.11
Radius server is made unreachable by adding a blackhole route/ adding a non-existent server, this will make the clients get authorized access. Now since the radius server is dead, the dead timer is applied
Deadtime (minutes) : 60
Timeout (seconds) : 3
Retransmit Attempts : 1
Global Encryption Key :
Dynamic Authorization UDP Port : 3799
Source IP Selection : 10.33.16.35
Source IPv6 Selection : Outgoing Interface
Auth Acct DM/ Time |
Server IP Addr Port Port CoA Window | Encryption Key
--------------- ---- ---- --- ------ + --------------------------------
10.53.159.176 1645 1646 No 300 | 1qazxsw23edc
18.104.22.168 1812 1813 No 300 | MDLZ@Radius!
Switch would consider the server dead for 60mins and should not try to reach it in that interval. But here the switch tries to reach the server every 20-25secs and re-applying the authorized access to the client causing client traffic interruption.
W 12/06/19 10:40:44 00989 auth: AUTHORIZED Access granted for access method
I 12/06/19 10:40:44 00421 radius: Can't reach RADIUS server 22.214.171.124
E 12/06/19 10:40:41 02648 srcip: RADIUS - failure to send out pkt
E 12/06/19 10:40:39 02648 srcip: RADIUS - failure to send out pkt
I 12/06/19 10:40:28 00179 mgr: SME TELNET from 10.87.56.68 - MANAGER Mode
I 12/06/19 10:40:28 04243 mgr: User uje8175 : Moved to manager mode for the
TELNET session from IP address 10.87.56.68
W 12/06/19 10:40:21 00989 auth: AUTHORIZED Access granted for access method
I 12/06/19 10:40:21 00421 radius: Can't reach RADIUS server 126.96.36.199
E 12/06/19 10:40:18 02648 srcip: RADIUS - failure to send out pkt
E 12/06/19 10:40:15 02648 srcip: RADIUS - failure to send out pkt
W 12/06/19 10:39:57 00989 auth: AUTHORIZED Access granted for access method
After the dead timer expires, the switch is expected to try to reach the RADIUS server again to initiate a new session for the respective client.
Here we tried with Windows-7 client and 2915 switch. We made radius server as not reachable and client is authorized since it has been configured. During this time, switch is sending the success message to the client. But Windows-7 client is ignoring it. So that every 30 seconds, client is sending EAPOL packets within the dead timer interval which is causing the logs message related to authorization in the switch.
Verified the same behavior with UBUNTU, Centos clients it was working fine. Make sure you collect the pcap from the client end as Windows machines are chatty and could be sending multiple EAPOL packets causing the authentication to occur again before the dead timer expires.