Wired Intelligent Edge

 View Only
last person joined: yesterday 

Bring performance and reliability to your network with the HPE Aruba Networking Core, Aggregation, and Access layer switches. Discuss the latest features and functionality of your switching devices, and find ways to improve security across your network to bring together a mobile-first solution
Expand all | Collapse all

High Collision or Drop Rate

This thread has been viewed 6 times
  • 1.  High Collision or Drop Rate

    Posted Mar 18, 2024 06:19 PM

    Hello, I am experiencing an issue across my entire network, where I am continually seeing "High Collision Or Drop Rate" messages being generated from Aruba Central.  The messages are being reported on my edge and access switches, which are a combination of 2930M, 5406R and 5412R, and I am running on WC.16.11.0008 for the F/W across my entire site.    The alerts appear to be generated on access ports, which typically have a VOIP phone, PC or a combination of both attached to them.  I've seen various posts related to this issue, but can't seem to find a definitive resolution to the issue.  A support call with HPE TAC wasn't of much success either, as they were not able to provide a valid resolution.

    Upon further investigation, I do see that some of the VOIP phones are 100MB which are daisy chained with 1GB laptops or PC's, or in the case of some ports, its just the 1GB phones that are connected.  I've read in some posts to try "swapping out the copper cables", but I think the issue is beyond just a bad cable.  Another suggestion made in another post is to disable loop protect on my downlinks from my 5412 Core to my access layer switches, but reluctant to try that at this point.  Any further suggestions or input would be appreciated.   



  • 2.  RE: High Collision or Drop Rate

    Posted Mar 19, 2024 04:26 AM

    Hi, I've only ever seen this counter go up in cases where there is too much traffic attempting to fit into a small pipe. So in your case you have seen 1G of traffic attempting to go down a 100mb connection. This is theoretically possible if there is 2G of traffic from upstream trying to be transmitted onto a 1G connection. That could cause the majority of logs you see above.

    On the CLI you get more detail:

     Status and Counters - Port Counters for port A1

      Name  :
      MAC Address      : f06281-7c5bff
      Link Status      : Up
      Totals (Since boot or last clear) :
       Bytes Rx        : 4,088,285,858        Bytes Tx        : 3,711,504,856
       Unicast Rx      : 375,444,537          Unicast Tx      : 566,324,956
       Bcast/Mcast Rx  : 3,725,089            Bcast/Mcast Tx  : 1,503,832,170
      Errors (Since boot or last clear) :
       FCS Rx          : 0                    Drops Tx        : 857,942
       Alignment Rx    : 0                    Collisions Tx   : 0
       Runts Rx        : 0                    Late Colln Tx   : 0
       Giants Rx       : 0                    Excessive Colln : 0
       Total Rx Errors : 0                    Deferred Tx     : 0
      Others (Since boot or last clear) :
       Discard Rx      : 0                    Out Queue Len   : 0
       Unknown Protos  : 0
      Rates (5 minute weighted average) :
       Total Rx (bps) : 3,704                 Total Tx (bps) : 10,272
       Unicast Rx (Pkts/sec) : 2              Unicast Tx (Pkts/sec) : 2
       B/Mcast Rx (Pkts/sec) : 0              B/Mcast Tx (Pkts/sec) : 10
       Utilization Rx  :     0 %              Utilization Tx  :     0 %

    Firstly you can see if there is high collision or drops. Knowing which counters would be useful.

    Next thing to say is that log messages on the switch that relate to those you see in Central only occur when you get a high rate of them. In my example above the switch hasn't been rebooted in a very long time. The counter is historical and isn't going up. If it trickles up there would be no log message. So that points to an event rather a general issue. 

    CRC errors can only occur with physical issues in my experience. You'll need to resolve with a physical layer solution (patch cable, socket, NIC).

    I note your logs appear to be day time. If they are working days as well (mon-fri where I am) then it points further to human activity rather than automated. 

    Excluding the CRC error messages I would take a couple of examples and check how many are devices in the back of a phone which have a high chance of 'demanding' more traffic than the wire can handle. I had one department where I am that had lots of these all of a sudden. Seemed like a switch fault. After investigation the department had a new database type application that caused large spikes of download when a page was clicked on. 1G PC into a 100Mb phone. A test with one connecting PC directly resolved (and created a demand for either new phones or drilling into walls). We concluded that modern working and applications are not compatible with 100mb phone daisy chaining.