Wireless Access

Reply
Occasional Contributor I

Strange issue on 2 Member S2500-24 Stack - Stack Disconnecting ~11am

Recently I got a pair of Aruba S2500 switches and they are configured as a stack with two stack interconnections.  Assume nothing is plugged in client wise as this issue occurs reguardless of whats connected. My issues is that after a day or two the SFP+ stack ports between the switches just shut down.  The ports are not admin shut down, the LED just turns off and on in about 5 sec increments and the link drops. The stack does not fail over to the secondary connection and it just hangs.  This happens on both stack interfaces in the same way.  It happens when I drop to just 1 stack interconnection. The only way to revcover is to unseat and reset the SFP module in the switch. The other 10g ports are not affected when connected to say a PC.  None of the copper ports have this issue. The issue happens when I drop the stack config and just use traditional trunking ports. Lastly this easy seems to only happen between 11am - 11:30am central - it never fails outside of that time.

 

The ports do show some Octet errors but thats it and I am not sure if that is the result of the error or the cause. This seems like a straight forward issue with a bad SFP or fiber, but please consider what I have tried to do to fix it:

  • Replaced fiber cable (OM3 fiber)
  • Tried different SFP+ port (even swapped with port that had no issues with my PC using a standard trunk)
  • Tried different 5 different brands of SR 10G SFP+ modules including an authentic Aruba optics.
  • Replaced affected optics with LR (single mode) 10G optics on both sides and replaced with single mode cables.
  • Tried both a standard trunk, access, as well as stacking
  • Issue occurs on both switches, even when split and independent
  • Both switches are on stable and UPS conditioned power.
  • Firmware is patched to latest (Aruba OS v7.4.0.6), but I also tried an older version (v7.4.0.4)

I have no idea what else to try. Despite everything above the issue still occurs.  It usually happens every 24-48 hours. As noted above, I see octet errors on the interface it happens to but thats it and pulling the SFP and reseating it clears the issue.

 

Here is the errors I would get

 

Logs and info:

Aruba Operating System Software.
ArubaOS (MODEL: ArubaS2500-24P-US), Version 7.4.0.6
Website: http://www.arubanetworks.com
Copyright (c) 2016 Aruba, a Hewlett Packard Enterprise company.
Compiled on 2018-01-11 at 00:15:44 PST (build 63167) by p4build
ROM: System Bootstrap, Version CPBoot 1.0.34.0 (build 32670)
Built: 2012-03-06 02:43:38
Built by: p4build@re_client_32670
Switch uptime is 1 days 19 hours 51 minutes 4 seconds
Reboot Cause: User reboot (0x86:0x78:0x402b)
Processor XLS 208 (revision A1) with 1023M bytes of memory.
955M bytes of System flash
Activation Key: LZQWURUG

Errorlog Snippet as issue starts (full logs below):

 

Jun 30 10:14:18  aaa_proxy[1404]: <341312> <ERRS> |aaa_proxy|  Unable to connect to MASTER AAA proxy socket:Operation now in progress
Jun 30 10:14:18  nanny[1370]: <399816> <ERRS> |nanny|  Terminating process /mswitch/bin/profmgr, pid 4249 
Jun 30 10:14:18  nanny[1370]: <399816> <ERRS> |nanny|  Terminating process /mswitch/bin/udbserver, pid 4251 
Jun 30 10:14:18  nanny[1370]: <399816> <ERRS> |nanny|  Terminating process /mswitch/bin/ntpwrap, pid 4252 
Jun 30 10:14:18  stackmgr[1467]: <399803> <ERRS> |stackmgr|  An internal system error has occurred at file ../ncfg_profmgr.c function ncfg_profmgr_recv_bytes line 65 error recv returned 0, expecting 5.
Jun 30 10:14:18  cfgm[1437]: PAPI_Send: sendto HTTPD Manager failed: Connection refused Message Code 5004 Sequence Num is 208 
Jun 30 10:14:18  stackmgr[1467]: <399803> <ERRS> |stackmgr|  An internal system error has occurred at file ncfg_gcore.c function ncfg_profmgr_task_based_error_handler line 165 error ncfg_profmgr_task_based_error_handler:Profile manager most probably died, context:0x100006b8. Handling error for task-based app and resyncing with profile-manager.
Jun 30 10:14:18  cfgm[1437]: PAPI_Send: sendto Interface Manager failed: Connection refused Message Code 5004 Sequence Num is 210 
Jun 30 10:14:18  cfgm[1437]: PAPI_Send: sendto DHCP Daemon failed: No such file or directory Message Code 5004 Sequence Num is 211 
Jun 30 10:14:18  cfgm[1437]: PAPI_Send: sendto Profile Manager failed: Connection refused Message Code 5004 Sequence Num is 212 
Jun 30 10:14:18  cfgm[1437]: PAPI_Send: sendto IKE failed: Connection refused Message Code 5004 Sequence Num is 213 
Jun 30 10:14:18  cfgm[1437]: PAPI_Send: sendto Authentication failed: Connection refused Message Code 5004 Sequence Num is 214 
Jun 30 10:14:18  cfgm[1437]: PAPI_Send: sendto User Database Server failed: Connection refused Message Code 5004 Sequence Num is 215 
Jun 30 10:14:18  cfgm[1437]: <307228> <ERRS> |cfgm|  Error Accepting a connection to the Master Config socket:Resource temporarily unavailable
Jun 30 10:14:19  im[6606]: <399803> <ERRS> |im|  An internal system error has occurred at file ../ncfg_profmgr.c function ncfg_profmgr_setup_prof_socket line 556 error Unable to connect to profmgr: Connection refused.
Jun 30 10:14:19  im[6606]: <330006> <ERRS> |im|  Connection failure with profile manager, Unable to connect to profmgr
Jun 30 10:14:19  im[6606]: <399803> <ERRS> |im|  An internal system error has occurred at file ncfg_gcore.c function ncfg_gated_profmgr_connect line 86 error ncfg_gated_profmgr_connect:ERROR: ncfg_init failed:14.
Jun 30 10:14:19  cfgm[1437]: PAPI_Send: sendto HTTPD Manager failed: Connection refused Message Code 5004 Sequence Num is 216 
Jun 30 10:14:19  cfgm[1437]: PAPI_Send: sendto DHCP Daemon failed: No such file or directory Message Code 5004 Sequence Num is 219 
Jun 30 10:14:19  cfgm[1437]: PAPI_Send: sendto Profile Manager failed: Connection refused Message Code 5004 Sequence Num is 220 
Jun 30 10:14:19  cfgm[1437]: PAPI_Send: sendto IKE failed: Connection refused Message Code 5004 Sequence Num is 221 
Jun 30 10:14:19  cfgm[1437]: PAPI_Send: sendto Authentication failed: Connection refused Message Code 5004 Sequence Num is 222 
Jun 30 10:14:19  cfgm[1437]: PAPI_Send: sendto User Database Server failed: Connection refused Message Code 5004 Sequence Num is 223 
Jun 30 10:14:21  activate[6641]: PAPI_Send: sendto Profile Manager failed: Connection refused Message Code 0 Sequence Num is 2 
Jun 30 10:14:21  authmgr[6610]: <199802> <ERRS> |authmgr|  main.c, main:510: Auth started/restarted or switchover happened
Jun 30 10:14:22  qosmgr[6633]: PAPI_Send: sendto Profile Manager failed: Connection refused Message Code 0 Sequence Num is 2 
Jun 30 10:14:22  rmon[6636]: PAPI_Send: sendto Profile Manager failed: Connection refused Message Code 0 Sequence Num is 2 
Jun 30 10:14:23  cmica[1461]: task_close: close Chassis Agent socket.-1: Bad file descriptor
Jun 30 10:14:23  aaa[6605]: PAPI_Send: sendto User Database Server failed: Connection refused Message Code 0 Sequence Num is 3 
Jun 30 10:14:23  mon_ssm[6659]: PAPI_Init: timeout of 0 specified set to default 100 millisec. 
Jun 30 10:14:23  mon_ssm[6663]: PAPI_Init: timeout of 0 specified set to default 100 millisec. 
Jun 30 10:14:23  mon_ssm[6664]: PAPI_Init: timeout of 0 specified set to default 100 millisec. 
Jun 30 10:14:23  mon_ssm[6662]: PAPI_Init: timeout of 0 specified set to default 100 millisec. 
Jun 30 10:14:23  mon_ssm[6661]: PAPI_Init: timeout of 0 specified set to default 100 millisec. 
Jun 30 10:14:23  stackmgr[1467]: <399803> <ERRS> |stackmgr|  An internal system error has occurred at file ../ncfg_profmgr.c function ncfg_profmgr_setup_prof_socket line 556 error Unable to connect to profmgr: Connection refused.
Jun 30 10:14:23  stackmgr[1467]: <330006> <ERRS> |stackmgr|  Connection failure with profile manager, Unable to connect to profmgr
Jun 30 10:14:23  stackmgr[1467]: <399803> <ERRS> |stackmgr|  An internal system error has occurred at file ncfg_gcore.c function ncfg_task_based_profmgr_reconnect line 131 error ncfg_task_based_profmgr_reconnect:Unable to reconnect with profile-manager. Setting task timer to 5.
Jun 30 10:14:23  ChassisManager[6629]: <335309> <ALRT> |ChassisManager|  Power supply  detected on slot 0
Jun 30 10:14:23  ChassisManager[6629]: <335308> <ALRT> |ChassisManager|  Module 1 2010067 (4-Port) detected on slot 0
Jun 30 10:14:23  ChassisManager[6629]: <335309> <ALRT> |ChassisManager|  Power supply  detected on slot 1
Jun 30 10:14:23  ChassisManager[6629]: <335308> <ALRT> |ChassisManager|  Module 1 2010067 (4-Port) detected on slot 1
Jun 30 10:14:25  cmica[1461]: task_close: close Chassis Agent socket.-1: Bad file descriptor
Jun 30 10:14:25  certmgr[6604]: <118004> <ERRS> |certmgr|  Received unknown message
Jun 30 10:14:25  im[6606]: <399803> <ERRS> |im|  An internal system error has occurred at file ../ncfg_profmgr.c function ncfg_profmgr_setup_prof_socket line 556 error Unable to connect to profmgr: Connection refused.
Jun 30 10:14:25  im[6606]: <330006> <ERRS> |im|  Connection failure with profile manager, Unable to connect to profmgr
Jun 30 10:14:25  im[6606]: <399803> <ERRS> |im|  An internal system error has occurred at file ncfg_gcore.c function ncfg_gated_profmgr_connect line 86 error ncfg_gated_profmgr_connect:ERROR: ncfg_init failed:14.
Jun 30 10:14:25  certmgr[6604]: <118004> <ERRS> |certmgr|  Received unknown message
Jun 30 10:14:25  publisher[1436]: PAPI_Send: sendto Auth Survival Server failed: Connection refused Message Code 0 Sequence Num is 61 
Jun 30 10:14:26  certmgr[6604]: <118004> <ERRS> |certmgr|  Received unknown message
Jun 30 10:14:26  profmgr[6609]: <399803> <ERRS> |profmgr|  An internal system error has occurred at file profmgr_stk.c function profmgr_nl_sub_cb line 152 error Our role is MASTER, re-syncing with certificate-manager .
Jun 30 10:14:28  authmgr[6610]: PAPI_AddSibyteOpcode: ReRegistering SAME call back function for opcode 0x004b sock = 15 
Jun 30 10:14:28  authmgr[6610]: PAPI_AddSibyteOpcode: ReRegistering SAME call back function for opcode 0x009e sock = 16 
Jun 30 10:14:28  stackmgr[1467]: <399803> <ERRS> |stackmgr|  An internal system error has occurred at file ncfg_gcore.c function ncfg_task_based_profmgr_reconnect line 141 error ncfg_task_based_profmgr_reconnect:Connection to profile-manager established, setting socket..
Jun 30 10:14:30  certmgr[6604]: <118004> <ERRS> |certmgr|  Received unknown message
Jun 30 10:14:30  profmgr[6609]: <399803> <ERRS> |profmgr|  An internal system error has occurred at file profmgr_stk.c function profmgr_nl_sub_cb line 152 error Our role is MASTER, re-syncing with certificate-manager .

 

Full log dump and config:

https://1drv.ms/u/s!Ar4F2XsjCm_BhLonTrrg411sLGmyvg?e=YhKyeO

 

I appreciate any help with this as I am at my wits end.   Let me know if I can provide any further details.

New Contributor

Re: Strange issue on 2 Member S2500-24 Stack - Stack Disconnecting ~11am

The firmware for that model is up to 7.4.1.10.  Upgrade to it and see if the issue persists.

Occasional Contributor I

Re: Strange issue on 2 Member S2500-24 Stack - Stack Disconnecting ~11am

I ran the following commands and only firmware 7.4.0.6 shows as current.  I did the update anyways and it still shows as 7.4.0.6.  Googling doesn't show the version.

 

#activate firmware check
#activate firmware upgrade
#reload
#show version

Aruba Operating System Software.
ArubaOS (MODEL: ArubaS2500-24P-US), Version 7.4.0.6
Website: http://www.arubanetworks.com
Copyright (c) 2016 Aruba, a Hewlett Packard Enterprise company.
Compiled on 2018-01-11 at 00:15:44 PST (build 63167) by p4build
ROM: System Bootstrap, Version CPBoot 1.0.34.0 (build 32670)
Built: 2012-03-06 02:43:38
Built by: p4build@re_client_32670
Switch uptime is 3 minutes 8 seconds
Reboot Cause: User reboot (0x86:0x78:0x402b)
Processor XLS 208 (revision A1) with 1023M bytes of memory.
955M bytes of System flash
Activation Key: Not available or unable to contact Activate

Is this on a beta channel or something?  Let me know. Thanks.

New Contributor

Re: Strange issue on 2 Member S2500-24 Stack - Stack Disconnecting ~11am

You can go to support.arubanetworks.com and find the firmware there.  7.4.1.10 is not a beta.

 

 

Occasional Contributor I

Re: Strange issue on 2 Member S2500-24 Stack - Stack Disconnecting ~11am

Ok I can see why I couldnt find it as I do not have a support contract as I bought these from an auction.  It is my understanding I should have 5 years on hardware and life time on firmware.  How do I go about getting access to the firmware library?  Just contact support and explain the situation and provide serial numbers?

Guru Elite

Re: Strange issue on 2 Member S2500-24 Stack - Stack Disconnecting ~11am

Message me your email address and I will send it to you.


*Answers and views expressed by me on this forum are my own and not necessarily the position of Aruba Networks or Hewlett Packard Enterprise.*
ArubaOS 8.4 User Guide
InstantOS 8.3 User Guide
Airheads Knowledgebase
Airheads Learning Videos
Aruba Central Documentation
Sign up for Security Alerts
Aruba Technical Webinars
Occasional Contributor I

Re: Strange issue on 2 Member S2500-24 Stack - Stack Disconnecting ~11am

I got  your email and am downloading the file now.  Thank you so much.

Occasional Contributor I

Re: Strange issue on 2 Member S2500-24 Stack - Stack Disconnecting ~11am

So far so good with the new firmware.  I am going to give it nother 48 hours and then I will marked this as solved.

Occasional Contributor I

Re: Strange issue on 2 Member S2500-24 Stack - Stack Disconnecting ~11am

Unfortunately this issue happened again today (right around 11:12 am as usual).  Any other suggestions?

Occasional Contributor I

Re: Strange issue on 2 Member S2500-24 Stack - Stack Disconnecting ~11am

Any other suggestions?  Or is this something I need to get a replacement switch for?

Search Airheads
cancel
Showing results for 
Search instead for 
Did you mean: