This occurs regardless of the source and destination between the DL360 and the C3000. We have been monitoring other interfaces of the 5406r (VSF) and 5412r switches, and there is no other high network usage. The VSF interfaces, other 10G interfaces to other servers do not exceed 5% utilization. The 10G interface that participates in the LACP between the 5406r and 5412r switch averages 10% consumption. In other tests we conducted, involving other servers, we were able to reach peaks on this same interface of 30% to 40%, and the 5406r VSF switch behaved normally.
During the problem, so far, what I could observe is a high increase in "Deferred TX" packets on the 10G interface between the 5406r and 5412r switches, but I believe this is relatively normal since the final destination of this communication is the server connected to the C3000 through a gigabit port.
Is there anything else I can test or validate? I even ran a Wireshark on the network, and there is no visible difference in the packets when the VSF core switch stops responding.
Original Message:
Sent: Aug 22, 2024 05:27 AM
From: parnassus
Subject: Switch-Packet loss on switch 5406r
In case of a very (very) busy network - in your case: if North<->South traffic traversing the VSF is very high saturating various 10Gbps links to servers so impacting heavily on ports of A modules - 8 SFP+ ports - of your VSF - I could eventually imagine "oversubscription" happening at level of Modules A on your VSF (since SFP+ on those Modules are used either for VSF interlinks and for uplinks to other Servers/Switch <- VSF Interlinks traffic should be very light normally but that could be not the case in some particular situation -> say HPE DL360 on the right has heavy traffic with the HPE DL360 server on the left <- given that those servers are not dual homed to the VSF)...BUT this I believe it is not your case (If I record correctly each v3 zl2 Module should grant 80Gbps of backplane bandwidth so plenty if we are considering just a simple file transfer over links 10G -> 1G or viceversa).
Original Message:
Sent: Aug 22, 2024 05:10 AM
From: parnassus
Subject: Switch-Packet loss on switch 5406r
Does it happen either when the file transfer starts from the HPE BladeSystem C3000 connected host (1G) to the HPE DL360 connected host (10G) or when it starts from the HPE DL360 connected host (10G) to the HPE BladeSystem C3000 connected host (1G) or in both directions?
Would be interesting to see anonimized ports' configurations (about physical and logical ports involved) both Switch(es) side and Servers side; AFAIK a VSF with IP routing duties (as in your case), should not experience IP routing issues if a particular host-to-host traffic - traversing it (or be routed by it) - saturates a particular port (say the slowest one, the 1Gbps one)...at least until on that uplink transits traffic to one particular VLAN only and it is not used to carry multi-VLANs traffic (which means that all VLANs traffic traversing that uplink could be impacted if just on one of its VLAN the traffic saturates the physical link)...I don't know if that could or couldn't be your specific case (given the description you gave us, IMHO it's probably not).
In your case the VSF is 10Gbps connected to both the HPE DL360 Server (directly connected to VSF with single links) and to an intermediate HPE Aruba 5412R zl2 (probably) through LACP 10Gbps+10Gbps...it made me think that if there is an issue it could be on the lower side of your diagram (but it's just a guess).
Original Message:
Sent: Aug 21, 2024 05:17 PM
From: alexandre.link
Subject: Switch-Packet loss on switch 5406r
We conducted tests using iperf on Linux and on other servers with Linux and Windows.
The problem does not occur on other servers; it is only observed when iperf or a file transfer is performed on a RedHat server installed on a physical blade in the C3000, where the blade's switch is a pass-thru.
We evaluated the Linux operating system on this server, and there are no spikes in I/O, bandwidth, memory, processing, etc.
I created a diagram to try to show the network topology, and note that the Aruba 5406r switch, which stops responding on the network, is not the switch physically connected to the blade.
Original Message:
Sent: Aug 21, 2024 02:52 PM
From: parnassus
Subject: Switch-Packet loss on switch 5406r
Hi! to evaluate bottlenecks a strategy would involve measuring network throughput figures between hosts with a solution like iperf3. Say, having necessary resources, you're able to setup two hosts (I prefer with Linux as guest OS...) and those two hosts are physically connected directly to your core switch on two of its ports (both copper ports at 1 Gbps? OK, on different Modules? OK) then with IPerf you can measure two ways traffic bandwidth and see the generated traffic to easily saturate those ports' bandwidth. If it happens you can look for culprits (explaining noted disconnections) elsewhere and remove the Core Switch from the equation (you can test traffic A to B either evaluating switching and routing scenarios depending how you place/address the involved hosts).
Original Message:
Sent: 8/21/2024 10:11:00 AM
From: alexandre.link
Subject: Switch-Packet loss on switch 5406r
When performing a file transfer on the network, the 5406 switch, which is in VSF, experiences packet loss on the network.
Since it is the network router, clients lose connection to other servers and other destinations.
The file transfer is performed on a server with a 1G network card, so there is no link overload.
The file transfer occurs between an HP DL360 server and a Blade C3000 with an HP 1Gb Ethernet Pass-Thru Module switch.
The switch is in version KB.16.11.0020