Hi,
Today I experienced an odd situation at one of our sites to which I have no clue why it happend.
At this particular site we use a 5406R zl2 Switch as a core which has four modules running on KB.16.02.0010. Each module has eight 10 Gbit ports. We have multiple HP 2530-48G-PoE+-2SFP+ (J9853A) running on YA.16.02.0010 and connected to the core with LACP trunks to two modules. Trunk1 consists of SFP ports 49 and 50 of the first 2530 which are connected to A1 and B1 of the 5406 and so on. Each SFP port in this star topology has a HP J9150A 10 Gbit transceiver and vendor certified cabling.
This topology has been running for three years with no hassle until this morning when trunks 1 (A1-B1), 4 (A4-B4), 5 (A5-B5), 8 (A8-B8) and 10 (C2-D2) suddenly went off-line. Local IT support inspected the core switch and impacted switches visually and saw that only the leds of the corresponding trunk ports were off. All other trunks, switches and core switch were operating normally. We examined the logs on the core switch (sh logging -r) but found no explainatory messages to determine the cause.
As the site is located remotely and we only had remote hands we decided to power-off and power-on the impacted 2530 switches. This luckily resolved our issue as the trunks became operational after the forced reboot ;) Afterwards I examined the log (sh logging -r) of each impacted 2530 switch but found no messages which helped me to understand what happend.
I checked our monitoring solutions (PRTG and Observium) for warning, errors, excessive traffic and did a health, trend and perfomance review but found no indications which explains why five trunks suddenly went off-line.
This really bothers me and I don't like it!
Has anyone experienced this behaviour? Do I miss some configuration which explains I do not see anything in the logging and monitoring? Is it software bug? Are there any other switch logs I can review?
Regards,
Raymond
#2530