Yesterday I got information from my client that they had some kind of short network failure which affected several switches. When I got this information and started look over it everything worked as normal.
After looking into switch logs I noticed that all problematic switches had logging info in core switch (two stacked 5406's):
I 07/18/18 13:14:44 00076 ports: ST1-CMDR: port 1/B5 in Trk12 is now on-line
I 07/18/18 13:14:41 00076 ports: ST1-CMDR: port 1/B6 in Trk13 is now on-line
I 07/18/18 13:14:36 00076 ports: ST1-CMDR: port 1/D1 in Trk15 is now on-line
I 07/18/18 13:14:03 00076 ports: ST1-CMDR: port 1/D2 in Trk19 is now on-line
I 07/18/18 13:13:40 00076 ports: ST1-CMDR: port 1/B8 in Trk16 is now on-line
I 07/18/18 13:11:38 00413 SNTP: ST1-CMDR: Updated time by 38 seconds from server
at 10.10.1.100. Previous time was Wed Jul 18 13:11:00 2018.
Current time is Wed Jul 18 13:11:38 2018.
I 07/18/18 13:00:53 00435 ports: ST1-CMDR: port 1/B5 is Blocked by LACP
I 07/18/18 13:00:52 00435 ports: ST1-CMDR: port 1/D1 is Blocked by LACP
I 07/18/18 13:00:52 00435 ports: ST1-CMDR: port 1/B6 is Blocked by LACP
I 07/18/18 13:00:28 00076 ports: ST1-CMDR: port 1/D5 in Trk21 is now on-line
I 07/18/18 12:59:52 00435 ports: ST1-CMDR: port 1/D2 is Blocked by LACP
I 07/18/18 12:59:52 00435 ports: ST1-CMDR: port 1/B8 is Blocked by LACP
There were also lots of logging in core log about missing SNTP server which was offline after somebody had rip off it's gps atenna. Also all switches had these missing SNTP log entries and switch time was reseted to default (01/01/1990)...
Connection failure appeared straight after gps antenna was repaired and sntp server came back up and problematic switches reported next log lines:
I 07/18/18 12:50:09 00076 ports: port 26 in Trk1 is now on-line
I 07/18/18 12:50:09 00435 ports: port 26 is Blocked by LACP
I 07/18/18 12:50:09 00393 lacp: Port 26 is blocked - error condition
I 07/18/18 12:44:28 00076 ports: port 25 in Trk1 is now on-line
I 07/18/18 12:44:28 00435 ports: port 25 is Blocked by LACP
I 07/18/18 12:44:28 00393 lacp: Port 25 is blocked - error condition
I 07/18/18 12:43:23 04611 job: Job Scheduler enabled
I 07/18/18 12:43:08 00413 SNTP: Updated time by 898311263 seconds from server at
10.10.1.100. Previous time was Mon Jan 29 08:48:45 1990. Current
time is Wed Jul 18 12:43:08 2018.
To debug this problem I tried to change one switches clock to wrong time and what happen was that ssh console dropped me off and I have to go on-site and use console (I hope it still works...) to fix clock!
Trunk between core and "dead" switch seems to be up and end user components seems to work just ok but cannot create connection by ping/ssh/web to administration ip of switch.
Does anybody has and idea why is this happening?
#ProCurve