So this one is still going.
It turns out that some of the STP daemon messages saying that the interface is going in to forwarding, has nothing to do with STP. It's just the interface being initialised. They are very misleading. TAC proved that STP is not putting interfaces in to forwarding incorrectly. So we closed THAT case.
However, I am still having issues so logged another TAC case recenlty... I did a packet capture on interfaces in the blocking state, and they are forwarding a single type of frame! It happens to be 'Loop Protect' frames generated by other switches in the network. Comes in one interface, and out of the other. They have god mode access and bypass loop prevention protocols. But it only happens in the first few seconds of an interface being enabled in STP (basically the learning state forwards this one frame type). Interestingly when we disable STP and use Smart Link instead, Smart Link forwards these Loop Protect frames 100% of the time, regardless of the secondary Smart Link port being blocked.
On the upstream distribution switch, I get 1 MAC flap for each switch that sends a loop protect frame, on every port connecting another Aruba switch. Thats because the loop protect frames have a source MAC of the switches MGMT interface, when it comes back out of the other Aruba switches, the distribution switch updates the MAC table. It has no idea where the switch sending the Loop Protect frame lives. I have 500 of these switches currently, so I get 2500 MAC flaps when this occurs. With smart link enabled its every 5 seconds - it completely kills the rest of the network. STP enabled - if there is a global power outage in this area for example, when they all boot I still get 3 seconds of chaos - then it settles down.
I'll keep you posted with the TAC case.
Original Message:
Sent: Mar 06, 2024 07:57 AM
From: michael_f76
Subject: AOS-CX switches transition to STP forwarding state too quickly on boot
Hello guys, i was just wondering, has there been any update on this issue in the last weeks ?
I am just asking because I've also been investigating issues and unexpected STP behavior on CX devices in recent weeks.
Original Message:
Sent: Dec 13, 2023 09:07 AM
From: parnassus
Subject: AOS-CX switches transition to STP forwarding state too quickly on boot
Hi!
"TAC originally came back and said there was an issue - but have since backpedalled and said this is normal operations."
And so does it mean that there wasn't an (unknown before) Bug in the ArubaOS-CX?
Just curious if the Aruba TAC asked you to first update your Aruba CX 6100 from running ArubaOS-CX 10.10.1040 (02/2023) to latest available ArubaOS-CX 10.10.1100 (12/2023) before trying to reproduce your issue or not (given that it is really an unknown bug affecting ArubaOS-CX 10.10 since the 10.10.1040 at best/worst...maybe with relationship with particular Aruba CX switches series...just guessing).
Original Message:
Sent: Dec 13, 2023 01:21 AM
From: BrettV
Subject: AOS-CX switches transition to STP forwarding state too quickly on boot
Hi Michael,
TAC originally came back and said there was an issue - but have since backpedalled and said this is normal operations. We have escalated through our reps because we are 100% confident that our admin-network ports aren't behaving as per the described documentation.
I want them to show me that my admin-network ports are waiting 3 seconds before forwarding (they aren't), or tell me that documentation is wrong (they said it is correct).
I'll keep you posted.
------------------------------
Brett V
Original Message:
Sent: Dec 11, 2023 07:10 AM
From: michael_f76
Subject: AOS-CX switches transition to STP forwarding state too quickly on boot
Hello Brett.
did TAC provide any additional info yet, that you could share?
I've been also working on STP-related troubleshooting on AOS-S and AOS-CX over the last few week, mostly regarding combinations of Aruba 8325 and 2930F.
Regards, Michael
Original Message:
Sent: Dec 06, 2023 07:50 AM
From: BrettV
Subject: AOS-CX switches transition to STP forwarding state too quickly on boot
TAC have been able to reproduce the issue - looks like a bug. Waiting to see the scope of the issue (which firmware, which devices etc)
------------------------------
Regards,
Brett V
Original Message:
Sent: Nov 01, 2023 02:38 AM
From: BrettV
Subject: AOS-CX switches transition to STP forwarding state too quickly on boot
An update:
This happens not just at boot time, but whenever we disconnect, and reconnect interfaces.
In fact - it's much faster in this scenario. The switch puts both uplinks in forwarding states less than half a second after the interfaces are reconnected. It is not honouring any STP wait times.
Waiting on TAC's official response.
------------------------------
Regards,
Brett V
Original Message:
Sent: Oct 27, 2023 02:16 AM
From: BrettV
Subject: AOS-CX switches transition to STP forwarding state too quickly on boot
Hi all,
When my 6100 switches running firmware 10.10.1040 boot up, it seems their uplinks go into 'forwarding' states too fast. It's almost as if they are ignoring the 'spanning-tree port-type admin-network' command on the uplinks to the distribution switches.
Sanitised logs (apologies for the reverse order):
2023-10-26T02:40:01.693563+00:00 6100 hpe-mstpd[2370]: Event|2011|LOG_INFO|AMM|1/1|Topology Change received on port 1/1/16 for CIST from source: 68:87:c6:21:28:d62023-10-26T02:40:01.684650+00:00 6100 hpe-mstpd[2370]: Event|2011|LOG_INFO|AMM|1/1|Topology Change received on port 1/1/15 for CIST from source: 80:24:8f:4f:73:d62023-10-26T02:39:59.895664+00:00 6100 lldpd[2375]: Event|104|LOG_INFO|AMM|1/1|LLDP neighbor 20:cf:ae:75:39:40 added on 1/1/162023-10-26T02:39:59.694396+00:00 6100 hpe-mstpd[2370]: Event|2011|LOG_INFO|AMM|1/1|Topology Change received on port 1/1/16 for CIST from source: 68:87:c6:21:28:d62023-10-26T02:39:59.674385+00:00 6100 hpe-mstpd[2370]: Event|2011|LOG_INFO|AMM|1/1|Topology Change received on port 1/1/15 for CIST from source: 80:24:8f:4f:73:d62023-10-26T02:39:59.067362+00:00 6100 hpe-udldd[2394]: Event|4103|LOG_INFO|AMM|1/1|UDLD interface 1/1/16 is unblocked2023-10-26T02:39:59.066786+00:00 6100 hpe-udldd[2394]: Event|4107|LOG_INFO|AMM|1/1|UDLD link is enabled on interface: 1/1/162023-10-26T02:39:59.012758+00:00 6100 hpe-udldd[2394]: Event|4103|LOG_INFO|AMM|1/1|UDLD interface 1/1/15 is unblocked2023-10-26T02:39:59.011410+00:00 6100 hpe-udldd[2394]: Event|4107|LOG_INFO|AMM|1/1|UDLD link is enabled on interface: 1/1/152023-10-26T02:39:57.883672+00:00 6100 hpe-mstpd[2370]: Event|2011|LOG_INFO|AMM|1/1|Topology Change received on port 1/1/16 for CIST from source: 68:87:c6:21:28:d62023-10-26T02:39:57.882129+00:00 6100 hpe-mstpd[2370]: Event|2011|LOG_INFO|AMM|1/1|Topology Change received on port 1/1/15 for CIST from source: 80:24:8f:4f:73:d62023-10-26T02:39:57.722429+00:00 6100 hpe-mstpd[2370]: Event|2011|LOG_INFO|AMM|1/1|Topology Change received on port 1/1/15 for CIST from source: 80:24:8f:4f:73:d62023-10-26T02:39:57.293482+00:00 6100 cdpd[2352]: Event|8903|LOG_INFO|AMM|1/1|CDP neighbor 68:87:c6:21:28:d6 is added on 1/1/162023-10-26T02:39:57.292687+00:00 6100 cdpd[2352]: Event|8903|LOG_INFO|AMM|1/1|CDP neighbor 80:24:8f:4f:73:d6 is added on 1/1/152023-10-26T02:39:57.291859+00:00 6100 hpe-mstpd[2370]: Event|2014|LOG_INFO|AMM|1/1|Port 1/1/16 blocked on CIST2023-10-26T02:39:57.290973+00:00 6100 hpe-mstpd[2370]: Event|2011|LOG_INFO|AMM|1/1|Topology Change received on port 1/1/15 for CIST from source: 80:24:8f:4f:73:d62023-10-26T02:39:56.821326+00:00 6100 hpe-mstpd[2370]: Event|2006|LOG_INFO|AMM|1/1|CST - Root changed from 61440: 94:60:d5:4e:aa:e0 to 8192: cc:ed:4d:c6:c7:e42023-10-26T02:39:56.400919+00:00 6100 lldpd[2375]: Event|109|LOG_INFO|AMM|1/1|Configured LLDP tx-delay to 22023-10-26T02:39:55.505926+00:00 6100 hpe-mstpd[2370]: Event|2012|LOG_INFO|AMM|1/1|CIST - Topology Change generated on port 1/1/16 going in to forwarding2023-10-26T02:39:55.505060+00:00 6100 ops_appsdnsclient[2353]: Event|11901|LOG_INFO|||Domain server create event for VRF default2023-10-26T02:39:54.863670+00:00 6100 intfd[581]: Event|403|LOG_INFO|AMM|1/1|Link status for interface 1/1/16 is up2023-10-26T02:39:54.846044+00:00 6100 intfd[581]: Event|403|LOG_INFO|AMM|1/1|Link status for interface 1/1/15 is up2023-10-26T02:39:53.084641+00:00 6100 ndmd[2361]: Event|6131|LOG_INFO|AMM|1/1|Neighbor Discovery daemon started2023-10-26T02:39:49.385150+00:00 6100 hpe-mstpd[2370]: Event|2001|LOG_INFO|AMM|1/1|MSTP Enabled2023-10-26T02:39:48.603506+00:00 6100 lacpd[2368]: Event|1319|LOG_INFO|AMM|1/1|LAG global load balancing mode is set to l3-src-dst-hash2023-10-26T02:39:48.569600+00:00 6100 lacpd[2368]: Event|1307|LOG_INFO|AMM|1/1|LACP system ID set to 94:60:d5:4e:aa:e02023-10-26T02:39:43.647907+00:00 6100 hpe-cardd[519]: Event|3204|LOG_INFO|AMM|1/1|Line module 1/1 is ready2023-10-26T02:39:43.516813+00:00 6100 hpe-cardd[519]: Event|3225|LOG_DEBUG|AMM|1/1|Line module 1/1 ASIC initialization completed2023-10-26T02:39:20.653892+00:00 6100 pmd[591]: Event|3808|LOG_INFO|AMM|1/1|Transceiver 1G-LX inserted in 1/1/16 is supported. An HPE pluggable module that is supported in this interface2023-10-26T02:39:20.559194+00:00 6100 pmd[591]: Event|3808|LOG_INFO|AMM|1/1|Transceiver 1G-LX inserted in 1/1/15 is supported. An HPE pluggable module that is supported in this interface2023-10-26T02:39:20.132182+00:00 6100 pmd[591]: Event|3804|LOG_INFO|AMM|1/1|Transceiver hot-swap insert for interface 1/1/162023-10-26T02:39:20.131186+00:00 6100 pmd[591]: Event|3804|LOG_INFO|AMM|1/1|Transceiver hot-swap insert for interface 1/1/152023-10-26T02:39:18.633080+00:00 6100 hpe-cardd[519]: Event|3224|LOG_INFO|AMM|1/1|Line module 1/1 ASIC initialization started2023-10-26T02:39:12.079474+00:00 6100 hpe-sysmond[582]: Event|6303|LOG_INFO|AMM|1/1|Current system memory usage for module 1/1 is 8%2023-10-26T02:39:10.141753+00:00 6100 crash-tools[2151]: Event|1206|LOG_CRIT|||Module rebooted. Reason : Reboot requested by user, Version: PL.10.10.1040#012, Boot-ID : de795b4fc0c34f5fac7969ca7fe6e8292023-10-26T02:39:07.772767+00:00 6100 hpe-cardd[519]: Event|3233|LOG_INFO|AMM|1/1|Line module 1/1 configured with product number JL679A2023-10-26T02:39:05.641167+00:00 6100 hpe-config-boot[564]: Event|6803|LOG_INFO|AMM|-|Config-Migration:JL679A found in Subsystem data2023-10-26T02:38:48.606740+00:00 6100 powerd[590]: Event|301|LOG_INFO|AMM|1/1|PSU 1/1 changed state to OK
STP and interface onfig:
spanning-treespanning-tree trap new-rootspanning-tree trap loop-guard-inconsistencyspanning-tree priority 15!interface 1/1/15 no shutdown vlan trunk native xx vlan trunk allowed xx,xx,xx spanning-tree loop-guard loop-protect udld udld mode rfc5171 normal!interface 1/1/16 no shutdown vlan trunk native xx vlan trunk allowed xx,xx,xx spanning-tree loop-guard loop-protect udld udld mode rfc5171 normal
According to the default STP port type of 'admin-network' (no port type set), the port should wait 3 seconds before transitioning to the forwarding state. This doesn't seem to be happening if you look at the logs.
If I am reading the logs correctly, the interfaces come up at 2:39:54, and port 1/1/16 transitions to forwarding at 2:39:55. A BPDU is received 2 seconds later, and port 1/1/6 is shut down as it should always have been. I don't know what is happening to port 1/1/15. It's not clear.
I just opened a TAC case, but would love to discuss this openly for the sake of urgency. A 100+ of these reboot at the exact same time (power outage) and caused chaos upstream. 100+ switches looping for a couple of seconds simultaneously isn't pretty.
My questions are:
- When a switch boots and is running modern-day versions of STP, do the ports no longer go into the listening, and learning states before forwarding? Timestamps suggest they don't.
- Is my troubleshooting correct? My switches are forwarding before they receive a BPDU? Loop first, then negotiate STP later - definitely doesn't sound right.
- Am I missing something?
PS - LACP isn't an option in this network segment. That is a long story.
------------------------------
Regards,
Brett V
------------------------------