Wired Intelligent Edge

 View Only
  • 1.  Arp Cache Age

    Posted Aug 11, 2015 04:46 PM

    I've run into an issue where incomplete entries from the arp-cache don't seem to be expiring on our 5400zl and is rendering hosts unreachable across the router. When a host goes offline for a long period of time and gets pinged while offline, its address in the ARP table gets stuck at 000000-000000. Our arp-age is set at 20 minutes so I'd assume it'd try to resolve the address again after that amount of time. I can manually clear the entry but I'm not sure how to prevent a situation like this in the first place.



  • 2.  RE: Arp Cache Age

    Posted Aug 12, 2015 10:35 AM

    Could you attach the config?  Also, what code is this?

     

    Usually all 0's is an unresolvable next hop route, but that does not appear to be the case here.  Does the router send the ARP request out to the host, and does the host respond?

     

     



  • 3.  RE: Arp Cache Age

    Posted Aug 13, 2015 02:26 PM
      |   view attached

    When a device goes offline for a time, and something tries to access the ip directly the entry is stuck with 0s the router does not send an ARP request. Once I ensure the device is back on the network and clear the cache entry, the router sends the ARP request to the appropriate subnet and the problem is resolved. I'm leaning towards a firmware bug at this point, but I'm hunting for any configuration problems before we find the time to bring it up to the current version.

     

    I've attached the switch config. This switch has been managed by people of various levels of networking experience over the years so an odd routing setting wouldn't surprise me but nothing pops out to me. In the recent problems I've had, devices on  vlans 150 and 11 have become unreachable from vlans 14 & 10.

    Attachment(s)



  • 4.  RE: Arp Cache Age

    Posted Aug 13, 2015 02:27 PM

    I should clarify that first sentence, I'd assume an initial arp request goes out. But once it's stuck with all 0s I'm not seeing an arp request for future attempts to access the device.



  • 5.  RE: Arp Cache Age

    Posted Aug 13, 2015 11:50 PM

    Nothing suspect in the config.  When you say "devices on  vlans 150 and 11 have become unreachable from vlans 14 & 10" does that mean the hosts with all 0's ARP are located on vlans 150 & 11?  

     

    What code is running?  Someone else had a similar post last week so I'll try and set this up in the lab.

     

     



  • 6.  RE: Arp Cache Age

    Posted Aug 14, 2015 12:54 PM

    As a concrete example, yesterday I hooked up a device as 192.168.1.206. I tried to access it from 192.168.7.70 but discovered I had forgot something. I don't recall exactly if it was unplugged or on the wrong vlan, if that matters. I had to clear the ARP cache entry to get the connection working. In that case I'm not sure if waiting for the entry to expire would have resolved the problem, but the device was accessible from devices on the same subnet. I can run some more controlled tests if need-be.

     

    The switch is:

    ProCurve Switch 5406zl(J8697A)      K.15.02.0005,ROMK.15.10

     

    It's definitely on older firmware so we can find time to upgrade it. It's just a critical component for us so we've avoided any unnecessary updates or downtime.



  • 7.  RE: Arp Cache Age

    Posted Aug 17, 2015 10:10 PM

    Well I loaded up your config with that build and I don't see anything blantantly obvious or broken.  That doesn't mean there isn't something going on though.  

     

    If your destination client moved to another port or VLAN without that port going down (like if it was on another switch), and that client did not source another packet to cause the MAC to move, then the 5400 would continue to send traffic to the original destination port and VLAN.  Flushing the ARP table and causing the 5400 to ARP may be forcing the client to source a packet, thereby resolving the situation.  Maybe.

     

    The all 0's MAC for unreachable clients is still a little perplexing.  You should really only see that for static hosts, like a next hop for an IP route.  

     

    You might try lowering the ARP timer and see if that help.  Upgrading the code might be worth a try also.

     

     

     



  • 8.  RE: Arp Cache Age

    Posted Aug 17, 2015 10:45 PM

    The devices are attached to another switch, and the devices that have been problematic that led me to start poking around have mostly been fairly "dumb" devices like video projectors and switchers that don't tend to do any broadcasting. But, with the 0 address there wasn't any port associated with it. I tried plugging my laptop directly into the 5400 and communicating with the devices to see if it would pickup the traffic at a layer 2 level on while talking across the same vlan and update the entry, but no luck.

     

    Is there a good document that explains some of the finer points of how that table behaves and updates?

    Some of my questions are:
    Can the timers on the table be printed? Does anything trigger them to reset back to the default age or is the countdown absolute once it's added? I'd imagine any layer 3 traffic to a given IP would reset the timer for an entry, unless the route is 0s(?)

    I'm also curious if traffic destined for an incomplete entry would end up getting sent to the default route.

     

    This isn't a high priority for us, so it'll be a few days until I can look into more. But there are a few devices I can think of where this should have been an issue but it wasn't, which may provide some clues. My best guess though is that they're simply announcing themselves on the network and resetting the entry.

     

     



  • 9.  RE: Arp Cache Age

    Posted Aug 21, 2015 12:13 AM

    I was able to reproduce the all 0's MAC address in the ARP table.  After sending traffic to the destination host to populate the ARP/route entry I then prevented the destination client from responding to ARP requests.  After the ARP timer expired I was left with all 0's for the ARP entry.  The 5400 continued to send ARP requests for destination MAC and was able to reestablish communication once I allowed the ARP response through.

     

    This was the only way I could reproduce your symptom.  Clear ARP would not fix this condition, obviously.  Unless something is really afoul I'm leaning towards the client not responding to the ARP requests.  It would be good to mirror the port that the client resides on and confirm.

     

    As for your questions there isn't really any documentation that describes how the ARP table operates, unfortunately.  As for the others:

       - ARP timer show - nothing in the CLI unfortunately

       - Timer refresh - the hardware entries stay refreshed as long as traffic is running.  The table can be fully or partially flushed by way of:

          - ARP timer expires

          - STP topology change

          - Port/VLAN down 

          - "clear arp" or "clear ipv6 neighbor" (for IPv6 hosts)

          - Traffic in this state will not use the default route, a connected best match route exists for this host.  For connected networks the switch will attempt to resolve the MAC address for the destination IP and create a local hardware route for it.