Wired Intelligent Edge

 View Only
last person joined: yesterday 

Bring performance and reliability to your network with the HPE Aruba Networking Core, Aggregation, and Access layer switches. Discuss the latest features and functionality of your switching devices, and find ways to improve security across your network to bring together a mobile-first solution
Expand all | Collapse all

AOS-CX 8360: Spurious ifOutDiscards

This thread has been viewed 13 times
  • 1.  AOS-CX 8360: Spurious ifOutDiscards

    Posted Oct 19, 2021 11:51 AM

    Hi,

    we're seeing occasional ifOutDiscards (interface TX Drops) on 100G Interfaces, even though there shouldn't be any. Closer investigation showed that there really aren't any, but ifOutDiscards SNMP counters sometimes flare up with values that are gone again on the next query. The values that come up often are the same as in a previous cycle, so it appears as if they are stuck somewhere and occasionally manage to become visible. Like so:

    [dc-k1-leaf1 14:18:47]
    IF-MIB::ifOutDiscards.33 = Counter32: 0
    IF-MIB::ifOutDiscards.34 = Counter32: 0

    [dc-k1-leaf1 14:18:58]
    IF-MIB::ifOutDiscards.33 = Counter32: 5
    IF-MIB::ifOutDiscards.34 = Counter32: 1

    [dc-k1-leaf1 14:19:09]
    IF-MIB::ifOutDiscards.33 = Counter32: 0
    IF-MIB::ifOutDiscards.34 = Counter32: 0

    [... staying at 0 for a while ...]

    [dc-k1-leaf1 14:22:32]
    IF-MIB::ifOutDiscards.33 = Counter32: 5
    IF-MIB::ifOutDiscards.34 = Counter32: 1

    (this was just polling ifOutDiscards.33 and ifOutDiscards.34 every 10s in a loop using snmpget, targeting 1/1/33 and 1/1/34 on a 8360-32Y4C, where these are the first two of the four 100G interfaces).

    Given these are Counter32 which should only ever wrap on overflow or maybe when ifCounterDiscontinuityTime says so, the return to zero appears to be a sign of a problem. This is consistent with CLI output regarding these counters being zero (down to the per-queue counters), and with our expectation that some noise in the 100Mbps should not actually trigger TX Drops on a 100G interface. No need to say the above creates funny peaks in graphs monitoring the interfaces in question. It appears to be limited to TX Drops, other counters don't show these flashes of wrong values coming up.

    Interfaces I'm focusing on are L3 (IPv4 transit links between leaves and spines, carrying a /31 IPv4). Issue disappears for a while after "clear int statistics global", but comes back after a while. Same with a full switch reboot.

    Is this a known phenomenon on this platform? Anybody else seeing this?

    Thanks,
    Andre.



    ------------------------------
    Andre Beck
    ------------------------------


  • 2.  RE: AOS-CX 8360: Spurious ifOutDiscards

    MVP GURU
    Posted Oct 19, 2021 02:45 PM
    Hi, out of curiosity...is your Aruba 8360 already running at latest available AOS-CX 10.7 or 10.8 software build? I guess it is...given your description, those values look like "artifacts".





  • 3.  RE: AOS-CX 8360: Spurious ifOutDiscards

    Posted Oct 21, 2021 01:21 PM
    Hi,

    yup, all devices are on 10.08.1010, but I'm somewhat confident we had theses glitches in monitoring graphs before, when running 10.07.latest. I actually became aware of these flaring counters through the fault_finder_monitor NAE script pushing warnings IIRC, but later focused on the SNMP counters. Other interesting factoids:

    • It's only happening on our leaves, not the spines. The spines are simple underlay routers (OSPF as IGP on loopbacks and /31 IPv4 transits, iBGP RR to leaves), while the leaves do L2-only EVPN/VXLAN overlay by the book. The one special thing about them is being VSX pairs.
    • Seemingly the times when these counters flare up are synchronized chassis-wide, meaning they are not independent per interface, but come and go on all the affected interfaces at the same time. They cluster in time, so to say.
    • Actual true TX Drops stay at their value and don't drop back to 0. We did some isolated storm testing (looping with 100G really get's you enourmous packet rates...) and the legitimate drops produced there didn't disappear.
    • Our leaves use rate-limit storm controls on L2 access ports (10kpps for all the letters in BUM).
    So yes, it looks very artificial - like some racy counter readout occasionally "getting it wrong".

    Thanks & Greetings,
    Andre.