Wireless Access

last person joined: 17 hours ago 

Access network design for branch, remote, outdoor, and campus locations with HPE Aruba Networking access points and mobility controllers.
Expand all | Collapse all

Strange issue on 2 Member S2500-24 Stack - Stack Disconnecting ~11am

This thread has been viewed 3 times
  • 1.  Strange issue on 2 Member S2500-24 Stack - Stack Disconnecting ~11am

    Posted Jun 30, 2019 02:11 PM

    Recently I got a pair of Aruba S2500 switches and they are configured as a stack with two stack interconnections.  Assume nothing is plugged in client wise as this issue occurs reguardless of whats connected. My issues is that after a day or two the SFP+ stack ports between the switches just shut down.  The ports are not admin shut down, the LED just turns off and on in about 5 sec increments and the link drops. The stack does not fail over to the secondary connection and it just hangs.  This happens on both stack interfaces in the same way.  It happens when I drop to just 1 stack interconnection. The only way to revcover is to unseat and reset the SFP module in the switch. The other 10g ports are not affected when connected to say a PC.  None of the copper ports have this issue. The issue happens when I drop the stack config and just use traditional trunking ports. Lastly this easy seems to only happen between 11am - 11:30am central - it never fails outside of that time.

     

    The ports do show some Octet errors but thats it and I am not sure if that is the result of the error or the cause. This seems like a straight forward issue with a bad SFP or fiber, but please consider what I have tried to do to fix it:

    • Replaced fiber cable (OM3 fiber)
    • Tried different SFP+ port (even swapped with port that had no issues with my PC using a standard trunk)
    • Tried different 5 different brands of SR 10G SFP+ modules including an authentic Aruba optics.
    • Replaced affected optics with LR (single mode) 10G optics on both sides and replaced with single mode cables.
    • Tried both a standard trunk, access, as well as stacking
    • Issue occurs on both switches, even when split and independent
    • Both switches are on stable and UPS conditioned power.
    • Firmware is patched to latest (Aruba OS v7.4.0.6), but I also tried an older version (v7.4.0.4)

    I have no idea what else to try. Despite everything above the issue still occurs.  It usually happens every 24-48 hours. As noted above, I see octet errors on the interface it happens to but thats it and pulling the SFP and reseating it clears the issue.

     

    Here is the errors I would get

     

    Logs and info:

    Aruba Operating System Software.
    ArubaOS (MODEL: ArubaS2500-24P-US), Version 7.4.0.6
    Website: http://www.arubanetworks.com
    Copyright (c) 2016 Aruba, a Hewlett Packard Enterprise company.
    Compiled on 2018-01-11 at 00:15:44 PST (build 63167) by p4build
    ROM: System Bootstrap, Version CPBoot 1.0.34.0 (build 32670)
    Built: 2012-03-06 02:43:38
    Built by: p4build@re_client_32670
    Switch uptime is 1 days 19 hours 51 minutes 4 seconds
    Reboot Cause: User reboot (0x86:0x78:0x402b)
    Processor XLS 208 (revision A1) with 1023M bytes of memory.
    955M bytes of System flash
    Activation Key: LZQWURUG

    Errorlog Snippet as issue starts (full logs below):

     

    Jun 30 10:14:18  aaa_proxy[1404]: <341312> <ERRS> |aaa_proxy|  Unable to connect to MASTER AAA proxy socket:Operation now in progress
    Jun 30 10:14:18  nanny[1370]: <399816> <ERRS> |nanny|  Terminating process /mswitch/bin/profmgr, pid 4249 
    Jun 30 10:14:18  nanny[1370]: <399816> <ERRS> |nanny|  Terminating process /mswitch/bin/udbserver, pid 4251 
    Jun 30 10:14:18  nanny[1370]: <399816> <ERRS> |nanny|  Terminating process /mswitch/bin/ntpwrap, pid 4252 
    Jun 30 10:14:18  stackmgr[1467]: <399803> <ERRS> |stackmgr|  An internal system error has occurred at file ../ncfg_profmgr.c function ncfg_profmgr_recv_bytes line 65 error recv returned 0, expecting 5.
    Jun 30 10:14:18  cfgm[1437]: PAPI_Send: sendto HTTPD Manager failed: Connection refused Message Code 5004 Sequence Num is 208 
    Jun 30 10:14:18  stackmgr[1467]: <399803> <ERRS> |stackmgr|  An internal system error has occurred at file ncfg_gcore.c function ncfg_profmgr_task_based_error_handler line 165 error ncfg_profmgr_task_based_error_handler:Profile manager most probably died, context:0x100006b8. Handling error for task-based app and resyncing with profile-manager.
    Jun 30 10:14:18  cfgm[1437]: PAPI_Send: sendto Interface Manager failed: Connection refused Message Code 5004 Sequence Num is 210 
    Jun 30 10:14:18  cfgm[1437]: PAPI_Send: sendto DHCP Daemon failed: No such file or directory Message Code 5004 Sequence Num is 211 
    Jun 30 10:14:18  cfgm[1437]: PAPI_Send: sendto Profile Manager failed: Connection refused Message Code 5004 Sequence Num is 212 
    Jun 30 10:14:18  cfgm[1437]: PAPI_Send: sendto IKE failed: Connection refused Message Code 5004 Sequence Num is 213 
    Jun 30 10:14:18  cfgm[1437]: PAPI_Send: sendto Authentication failed: Connection refused Message Code 5004 Sequence Num is 214 
    Jun 30 10:14:18  cfgm[1437]: PAPI_Send: sendto User Database Server failed: Connection refused Message Code 5004 Sequence Num is 215 
    Jun 30 10:14:18  cfgm[1437]: <307228> <ERRS> |cfgm|  Error Accepting a connection to the Master Config socket:Resource temporarily unavailable
    Jun 30 10:14:19  im[6606]: <399803> <ERRS> |im|  An internal system error has occurred at file ../ncfg_profmgr.c function ncfg_profmgr_setup_prof_socket line 556 error Unable to connect to profmgr: Connection refused.
    Jun 30 10:14:19  im[6606]: <330006> <ERRS> |im|  Connection failure with profile manager, Unable to connect to profmgr
    Jun 30 10:14:19  im[6606]: <399803> <ERRS> |im|  An internal system error has occurred at file ncfg_gcore.c function ncfg_gated_profmgr_connect line 86 error ncfg_gated_profmgr_connect:ERROR: ncfg_init failed:14.
    Jun 30 10:14:19  cfgm[1437]: PAPI_Send: sendto HTTPD Manager failed: Connection refused Message Code 5004 Sequence Num is 216 
    Jun 30 10:14:19  cfgm[1437]: PAPI_Send: sendto DHCP Daemon failed: No such file or directory Message Code 5004 Sequence Num is 219 
    Jun 30 10:14:19  cfgm[1437]: PAPI_Send: sendto Profile Manager failed: Connection refused Message Code 5004 Sequence Num is 220 
    Jun 30 10:14:19  cfgm[1437]: PAPI_Send: sendto IKE failed: Connection refused Message Code 5004 Sequence Num is 221 
    Jun 30 10:14:19  cfgm[1437]: PAPI_Send: sendto Authentication failed: Connection refused Message Code 5004 Sequence Num is 222 
    Jun 30 10:14:19  cfgm[1437]: PAPI_Send: sendto User Database Server failed: Connection refused Message Code 5004 Sequence Num is 223 
    Jun 30 10:14:21  activate[6641]: PAPI_Send: sendto Profile Manager failed: Connection refused Message Code 0 Sequence Num is 2 
    Jun 30 10:14:21  authmgr[6610]: <199802> <ERRS> |authmgr|  main.c, main:510: Auth started/restarted or switchover happened
    Jun 30 10:14:22  qosmgr[6633]: PAPI_Send: sendto Profile Manager failed: Connection refused Message Code 0 Sequence Num is 2 
    Jun 30 10:14:22  rmon[6636]: PAPI_Send: sendto Profile Manager failed: Connection refused Message Code 0 Sequence Num is 2 
    Jun 30 10:14:23  cmica[1461]: task_close: close Chassis Agent socket.-1: Bad file descriptor
    Jun 30 10:14:23  aaa[6605]: PAPI_Send: sendto User Database Server failed: Connection refused Message Code 0 Sequence Num is 3 
    Jun 30 10:14:23  mon_ssm[6659]: PAPI_Init: timeout of 0 specified set to default 100 millisec. 
    Jun 30 10:14:23  mon_ssm[6663]: PAPI_Init: timeout of 0 specified set to default 100 millisec. 
    Jun 30 10:14:23  mon_ssm[6664]: PAPI_Init: timeout of 0 specified set to default 100 millisec. 
    Jun 30 10:14:23  mon_ssm[6662]: PAPI_Init: timeout of 0 specified set to default 100 millisec. 
    Jun 30 10:14:23  mon_ssm[6661]: PAPI_Init: timeout of 0 specified set to default 100 millisec. 
    Jun 30 10:14:23  stackmgr[1467]: <399803> <ERRS> |stackmgr|  An internal system error has occurred at file ../ncfg_profmgr.c function ncfg_profmgr_setup_prof_socket line 556 error Unable to connect to profmgr: Connection refused.
    Jun 30 10:14:23  stackmgr[1467]: <330006> <ERRS> |stackmgr|  Connection failure with profile manager, Unable to connect to profmgr
    Jun 30 10:14:23  stackmgr[1467]: <399803> <ERRS> |stackmgr|  An internal system error has occurred at file ncfg_gcore.c function ncfg_task_based_profmgr_reconnect line 131 error ncfg_task_based_profmgr_reconnect:Unable to reconnect with profile-manager. Setting task timer to 5.
    Jun 30 10:14:23  ChassisManager[6629]: <335309> <ALRT> |ChassisManager|  Power supply  detected on slot 0
    Jun 30 10:14:23  ChassisManager[6629]: <335308> <ALRT> |ChassisManager|  Module 1 2010067 (4-Port) detected on slot 0
    Jun 30 10:14:23  ChassisManager[6629]: <335309> <ALRT> |ChassisManager|  Power supply  detected on slot 1
    Jun 30 10:14:23  ChassisManager[6629]: <335308> <ALRT> |ChassisManager|  Module 1 2010067 (4-Port) detected on slot 1
    Jun 30 10:14:25  cmica[1461]: task_close: close Chassis Agent socket.-1: Bad file descriptor
    Jun 30 10:14:25  certmgr[6604]: <118004> <ERRS> |certmgr|  Received unknown message
    Jun 30 10:14:25  im[6606]: <399803> <ERRS> |im|  An internal system error has occurred at file ../ncfg_profmgr.c function ncfg_profmgr_setup_prof_socket line 556 error Unable to connect to profmgr: Connection refused.
    Jun 30 10:14:25  im[6606]: <330006> <ERRS> |im|  Connection failure with profile manager, Unable to connect to profmgr
    Jun 30 10:14:25  im[6606]: <399803> <ERRS> |im|  An internal system error has occurred at file ncfg_gcore.c function ncfg_gated_profmgr_connect line 86 error ncfg_gated_profmgr_connect:ERROR: ncfg_init failed:14.
    Jun 30 10:14:25  certmgr[6604]: <118004> <ERRS> |certmgr|  Received unknown message
    Jun 30 10:14:25  publisher[1436]: PAPI_Send: sendto Auth Survival Server failed: Connection refused Message Code 0 Sequence Num is 61 
    Jun 30 10:14:26  certmgr[6604]: <118004> <ERRS> |certmgr|  Received unknown message
    Jun 30 10:14:26  profmgr[6609]: <399803> <ERRS> |profmgr|  An internal system error has occurred at file profmgr_stk.c function profmgr_nl_sub_cb line 152 error Our role is MASTER, re-syncing with certificate-manager .
    Jun 30 10:14:28  authmgr[6610]: PAPI_AddSibyteOpcode: ReRegistering SAME call back function for opcode 0x004b sock = 15 
    Jun 30 10:14:28  authmgr[6610]: PAPI_AddSibyteOpcode: ReRegistering SAME call back function for opcode 0x009e sock = 16 
    Jun 30 10:14:28  stackmgr[1467]: <399803> <ERRS> |stackmgr|  An internal system error has occurred at file ncfg_gcore.c function ncfg_task_based_profmgr_reconnect line 141 error ncfg_task_based_profmgr_reconnect:Connection to profile-manager established, setting socket..
    Jun 30 10:14:30  certmgr[6604]: <118004> <ERRS> |certmgr|  Received unknown message
    Jun 30 10:14:30  profmgr[6609]: <399803> <ERRS> |profmgr|  An internal system error has occurred at file profmgr_stk.c function profmgr_nl_sub_cb line 152 error Our role is MASTER, re-syncing with certificate-manager .

     

    Full log dump and config:

    https://1drv.ms/u/s!Ar4F2XsjCm_BhLonTrrg411sLGmyvg?e=YhKyeO

     

    I appreciate any help with this as I am at my wits end.   Let me know if I can provide any further details.



  • 2.  RE: Strange issue on 2 Member S2500-24 Stack - Stack Disconnecting ~11am

    Posted Jun 30, 2019 09:13 PM

    The firmware for that model is up to 7.4.1.10.  Upgrade to it and see if the issue persists.



  • 3.  RE: Strange issue on 2 Member S2500-24 Stack - Stack Disconnecting ~11am

    Posted Jul 01, 2019 06:06 PM

    I ran the following commands and only firmware 7.4.0.6 shows as current.  I did the update anyways and it still shows as 7.4.0.6.  Googling doesn't show the version.

     

    #activate firmware check
    #activate firmware upgrade
    #reload
    #show version
    
    Aruba Operating System Software.
    ArubaOS (MODEL: ArubaS2500-24P-US), Version 7.4.0.6
    Website: http://www.arubanetworks.com
    Copyright (c) 2016 Aruba, a Hewlett Packard Enterprise company.
    Compiled on 2018-01-11 at 00:15:44 PST (build 63167) by p4build
    ROM: System Bootstrap, Version CPBoot 1.0.34.0 (build 32670)
    Built: 2012-03-06 02:43:38
    Built by: p4build@re_client_32670
    Switch uptime is 3 minutes 8 seconds
    Reboot Cause: User reboot (0x86:0x78:0x402b)
    Processor XLS 208 (revision A1) with 1023M bytes of memory.
    955M bytes of System flash
    Activation Key: Not available or unable to contact Activate

    Is this on a beta channel or something?  Let me know. Thanks.



  • 4.  RE: Strange issue on 2 Member S2500-24 Stack - Stack Disconnecting ~11am

    Posted Jul 01, 2019 06:14 PM
      |   view attached

    You can go to support.arubanetworks.com and find the firmware there.  7.4.1.10 is not a beta.

     

     



  • 5.  RE: Strange issue on 2 Member S2500-24 Stack - Stack Disconnecting ~11am

    Posted Jul 02, 2019 02:17 PM

    Ok I can see why I couldnt find it as I do not have a support contract as I bought these from an auction.  It is my understanding I should have 5 years on hardware and life time on firmware.  How do I go about getting access to the firmware library?  Just contact support and explain the situation and provide serial numbers?



  • 6.  RE: Strange issue on 2 Member S2500-24 Stack - Stack Disconnecting ~11am

    EMPLOYEE
    Posted Jul 02, 2019 02:23 PM

    Message me your email address and I will send it to you.



  • 7.  RE: Strange issue on 2 Member S2500-24 Stack - Stack Disconnecting ~11am

    Posted Jul 02, 2019 02:29 PM

    I got  your email and am downloading the file now.  Thank you so much.



  • 8.  RE: Strange issue on 2 Member S2500-24 Stack - Stack Disconnecting ~11am

    Posted Jul 04, 2019 01:02 PM

    So far so good with the new firmware.  I am going to give it nother 48 hours and then I will marked this as solved.



  • 9.  RE: Strange issue on 2 Member S2500-24 Stack - Stack Disconnecting ~11am

    Posted Jul 05, 2019 12:39 PM

    Unfortunately this issue happened again today (right around 11:12 am as usual).  Any other suggestions?



  • 10.  RE: Strange issue on 2 Member S2500-24 Stack - Stack Disconnecting ~11am

    Posted Jul 12, 2019 12:12 PM

    Any other suggestions?  Or is this something I need to get a replacement switch for?