What to do when SSH to controller is failing intermittently?
Environment Information: This article is applicable to controller running code less than 18.104.22.168
Symptoms: Check the actual login sessions to the controller
#show loginsesions ==> to verify authenticated sessions
Maximum management users through SSH is 5 as per design for now
show processes | include ssh
0.3 S 24518 1642 9612 1956 4 0 04:41 00:00:04 2b057094 /etc/ssh/sshd -D -f /etc/ssh/sshd_config
0.1 S 25002 24518 9748 2444 4 0 04:43 00:00:01 2b057094 sshd: sfd@pts/0
0.0 S 25029 25002 2176 396 4 0 04:43 00:00:00 2ac16094 -sshwrap
17.0 S 28027 24518 9612 2120 4 0 04:59 00:00:00 2b057094 sshd: [accepted]
15.5 S 28028 24518 9612 2120 4 0 04:59 00:00:00 2b057094 sshd: [accepted]
11.0 S 28029 24518 9612 2080 4 0 04:59 00:00:00 2b057094 sshd: [accepted]
4.0 S 28030 24518 9612 2032 4 0 04:59 00:00:00 2b057094 sshd: [accepted]
Establishing an SSH session to controller failed randomly with error message ssh_exchange_identification: Connection closed by remote host. SSH sessions were either stale or notty, where an SSH session does not exist but the underlying TCP connection exists.
Cause: As per standard, MaxStartups Specifies the maximum number of concurrent unauthenticated connections to the SSH daemon. Additional connections will be dropped until authentication succeeds or the LoginGraceTime expires for a connection. The default is 10:30:100.
Alternatively, random early drop can be enabled by specifying the three colon separated values “start:rate:full” (e.g. "10:30:60"). sshd(8) will refuse connection attempts with a probability of “rate/100” (30%) if there are currently “start” (10) unauthenticated connections. The probability increases linearly and all connection attempts are refused if the number of unauthenticated connections reaches “full” (60).
In Aruba it is 3:50:5 to be more stringent. The fix was added as security fix to address the vulnerability CVE-2010-5107
If we have 3 stale SSHD processes in the system and then subsequent requests starts failing randomly. With MaxStartup set to 3:50:5 .. after 3 stale processes in the system the subsequent requests starts failing with probablity of 50%
1. Configure value as per your requirement:-
(Aruba) (config) #loginsession timeout ?
<val> Timeout value, value of '0' disables it. default 15
minutes. Range is <5-60>
If there is an option of logging out automatically once the session activity is over in the SSH Client used by the customer, then one can set the loginsession timeout value through controller CLI to set the lifetime of the session and once session is terminated and client receives the notification it will execute “logout” also thereby cleaning up TCP conn as well and we will not see the “notty” processes on the controller.
2. Make sure you logout . If there are any scripts which login to controller, ensure it logsout
Performing graceful log out for all SSH sessions whose terminal was closed earlier without logging out. This clears notty sessions.
3. If there is no authorised session, fix available in 22.214.171.124 and above versions
#show processes | include ssh
* Setting parameter ClientAliveCountMax to 60 and parameter ClientAliveInterval to 0, which terminates SSH sessions that are idle for 60 seconds on the controller without killing the respective process from the shell. Disable keep alives on the SSH client so that the channel remains idle during inactivity
Answer: We can try to clear out stale entries. If using any script or monitoring device to login to controller. Ensure it logsout gracefully. Also refer the bug Bug 103937 fix available after 126.96.36.199 with parameters added to flush out entries when keep alives on disabled on the ssh client.