Wireless Access

last person joined: 19 hours ago 

Access network design for branch, remote, outdoor, and campus locations with HPE Aruba Networking access points and mobility controllers.
Expand all | Collapse all

150 AP93s offline for some reason, reboot controller, they come back up

This thread has been viewed 0 times
  • 1.  150 AP93s offline for some reason, reboot controller, they come back up

    Posted Jul 23, 2012 10:09 AM

    Hi,

     

    One of our 3600 controllers experienced some weirdness early this morning where 150 out of 500 APs associated to the controller went offline, a reboot of the controller brought them all back up.

     

    Any particular issue that anyone can think of that can cause this?

     

    Our graveyard guy saved the logs, but looking at them, I do not see anything related to this.

     

    Calling Aruba TAC since the issue is now resolved probably wont do much since they want to see the issue live and so would I.

     

    Any thoughts?

     

     

    Firmware: 6.1.3.0_32142

     


    #3600


  • 2.  RE: 150 AP93s offline for some reason, reboot controller, they come back up

    Posted Jul 23, 2012 10:19 AM

    Since you have 500 APs on a 3600, I assume these are all RAPs, right?  Could it have been an issue with a specific ISP?  Are all 150 that bounced in a certain geo. location?  When they came back up, did you get a log message saying why they bounced?  If so, can you post it for 1-2 or them?



  • 3.  RE: 150 AP93s offline for some reason, reboot controller, they come back up

    Posted Jul 23, 2012 10:28 AM
      |   view attached

    They are all RAPs in bridged mode.

    They are spread all over Canada.

     

    Do not think it was an ISP issue.

     

    Unfortunately all I got is what my agent saved as logs. He rebooted the controller therefore all of the logs are gone prior reboot.

    I can't seem to attach it to my post.

     

    Any commands I could run currently to validate?

    I ran show ap debug system-status ap-name <>

     

     

     

     

    Attachment(s)

    zip
    log.zip   717 KB 1 version


  • 4.  RE: 150 AP93s offline for some reason, reboot controller, they come back up

    Posted Jul 23, 2012 10:38 AM

    Unfortunately, nothing jumps out of the logs (at least to me).

     

    When you ran "show ap debug system-status ap-name <name of a RAP that booted>", what did you see for the reboots/bootstraps?



  • 5.  RE: 150 AP93s offline for some reason, reboot controller, they come back up

    Posted Jul 23, 2012 10:50 AM

    That is what I saw. Weird that the date/time is WAY off.

    The clock on the controller is valid.

     
    Reboot Information
    ------------------
    AP rebooted Mon Jul 23 01:21:56 UTC 2012; Process /usr/sbin/dnsmasq has too many open files (770)
    -------------------------------------------------------------------------------------------------

    Rebootstrap Information
    -----------------------
    Date       Time     Reason (Latest 10)
    --------------------------------------
    1999-12-31 16:02:03 Changing to LMS #0 (38.108.87.62)
    2012-07-23 02:01:17 Missed heartbeats
    2012-07-23 02:05:32 Changing to LMS #0 (38.108.87.62)
    2012-07-23 04:01:31 Missed heartbeats
    2012-07-23 04:05:12 Changing to LMS #0 (38.108.87.62)


    Rebootstrap LMS
    ---------------
    2012-07-23 04:05:12 Changing to LMS #0 (38.108.87.62)
    -----------------------------------------------------

     

    edit: sorry pasted wrong information



  • 6.  RE: 150 AP93s offline for some reason, reboot controller, they come back up

    Posted Jul 23, 2012 11:20 AM

    When an AP reboots, they report their time like that.  There is no battery for the clock, so when they power off, they "forget" the time and the first log message or two will be their factory default time.  As soon as they sync with a controller, they update their time, though.

     

    I would open a TAC case on this.  The message below is the culprit:

     

    Reboot Information
    ------------------
    AP rebooted Mon Jul 23 01:21:56 UTC 2012; Process /usr/sbin/dnsmasq has too many open files (770)
    --------------------------------------------------

    -----------------------------------------------

     

    TAC may be able to get the crash information from the AP and open a bug (or add your case to an existing bug to get it fixed quicker).



  • 7.  RE: 150 AP93s offline for some reason, reboot controller, they come back up

    Posted Jul 23, 2012 11:59 AM

    Perfect thanks.

     

    Any idea what the reboot msg means?



  • 8.  RE: 150 AP93s offline for some reason, reboot controller, they come back up

    Posted Jul 23, 2012 12:32 PM

    Seems to be a bug, but I am not 100% sure.  TAC can dig into it further and verify it's a known bug or open a new one if they haven't seen it before.



  • 9.  RE: 150 AP93s offline for some reason, reboot controller, they come back up

    Posted Jul 23, 2012 12:35 PM

    On the phone currently opening a case.

     

    Thanks for all your help.



  • 10.  RE: 150 AP93s offline for some reason, reboot controller, they come back up

    Posted Jul 23, 2012 01:19 PM

    Shipped off logs, crash info and output of 5 'show ap debug system-status ap-name <ap-name>.