I'm not an expert but this feels like an expiring key? I can time it down to the minute. 50 minutes every time for both controllers.
Original Message:
Sent: Jun 20, 2024 12:36 PM
From: nkuhl30
Subject: MM loses connection to controllers every ~50 minutes
So the IPSec tunnel is dropping every 50ish minutes from the MM to either controller.
From the MM:
Jun 20 12:29:15 2024 isakmpd[9173]: <103103> <9173> <WARN> |ike| DPD PEER DEAD: peer 10.40.0.13 port:4500
Jun 20 12:29:15 2024 isakmpd[9173]: <103103> <9173> <WARN> |ike| IPSec SA Deletion: IPSEC_delSa SPI:2fe5bf00 OppSPI:9cc82500 Dst:10.40.0.13 Src:10.0.0.100 flags:1001 dstPort:0 srcPort:0
Jun 20 12:29:15 2024 isakmpd[9173]: <103103> <9173> <WARN> |ike| IKE SA Deletion: IKE2_delSa peer:10.40.0.13:4500 id:3116891708 errcode:ERR_IKESA_CLEARED saflags:0x10000071 arflags:0x240
From one of the MDs:
Jun 20 12:29:12 2024 cfgm[3555]: <399838> <3555> <WARN> |cfgm| LmsHeartBeatResultAction: State(READY:UPDATE SUCCESSFUL:CFGID-626:PEND-626:INITCFGID:0) FD=33:Cannot heartbeat with the master.
Jun 20 12:29:15 2024 fpapps[3591]: <399838> <3594> <WARN> |fpapps| Received TUN_DOWN from IKE for default-local-master-ipsecmap
Jun 20 12:29:15 2024 isakmpd[3631]: <103103> <3631> <WARN> |ike| 10.0.0.100:4500-> IPSec SA Deletion: IPSEC_delSa SPI:9cc82500 OppSPI:2fe5bf00 Dst:10.0.0.100 Src:10.40.0.13 flags:19 dstPort:0 srcPort:0
Jun 20 12:29:16 2024 isakmpd[3631]: <103103> <3631> <WARN> |ike| IKE SA Deletion: IKE2_delSa peer:10.0.0.100:4500 id:2913920530 errcode:OK saflags:0x7110001d arflags:0x0
Jun 20 12:29:22 2024 cfgm[3555]: <399838> <3555> <WARN> |cfgm| LmsHeartBeatResultAction: State(READY:UPDATE SUCCESSFUL:CFGID-626:PEND-626:INITCFGID:0) FD=33:Cannot heartbeat with the master.
Jun 20 12:29:23 2024 IAP manager Process[4020]: PAPI RxPacket: Too small a packet of size 40 received.
Jun 20 12:29:23 2024 IAP manager Process[4020]: PAPI RxPacket: Should be >= 76
Jun 20 12:29:23 2024 IAP manager Process[4020]: PAPI RxPacket: Too small a packet of size 40 received.
Jun 20 12:29:23 2024 IAP manager Process[4020]: PAPI RxPacket: Should be >= 76
Original Message:
Sent: Jun 20, 2024 12:28 PM
From: DB86
Subject: MM loses connection to controllers every ~50 minutes
What do you see when using the "show log system all" and "show log security all" commands to see if there is anything in the logs on the MD or the MM?
------------------------------
Dustin Burns
Lead Mobility Engineer @Worldcom Exchange, Inc.
ACCX 1271| ACMX 509| ACSP | ACDA | MVP Guru 2022-2023
If my post was useful accept solution and/or give kudos
Original Message:
Sent: Jun 20, 2024 06:57 AM
From: nkuhl30
Subject: MM loses connection to controllers every ~50 minutes
Hello everyone,
I'm currently working with TAC on this but I want to reach out incase anyone has any ideas. We replaced our core switch last week (Aruba CX 6405) and, since then, our Aruba wireless cluster is having problems. We have one MM and two controllers in a cluster.
So what's happening is that the MM is briefly losing connectivity (dropping heartbeats) with both of the controllers, about once per hour. The loss of connectivity lasts about 10 seconds. This happens with both controllers, at separate times throughout the hour, and is not simultaneous. Neither controller is dropping off of the network. The MM is just losing connection which, if it lasts long enough, can cause the HA to kick in with APs moving, etc. If we check the uptime for all controllers and APs, they're normal and don't show any issues. I dug into the Aruba MM logs last night and immediately saw evidence of the MM losing connectivity to both controllers:
Jun 18 18:36:25 2024 cfgdist[9035]: <357002> <10388> <WARN> |cfgdist| freelc_node:355 (TID:10388) Status of 10.40.0.11(00:1a:1e:00:35:58) is now DOWN
Jun 18 18:36:33 2024 cfgdist[9035]: <357002> <10390> <WARN> |cfgdist| handle_read:702 (TID:10390) Status of ::ffff:10.40.0.11(00:1a:1e:00:35:58) is now UP
Jun 18 18:57:55 2024 cfgdist[9035]: <357002> <10389> <WARN> |cfgdist| freelc_node:355 (TID:10389) Status of 10.40.0.13(00:1a:1e:07:6a:d0) is now DOWN
Jun 18 18:58:02 2024 cfgdist[9035]: <357002> <10391> <WARN> |cfgdist| handle_read:702 (TID:10391) Status of ::ffff:10.40.0.13(00:1a:1e:07:6a:d0) is now UP
What's causing this to occur is yet to be determined. However, I've verified that the controllers are not dropping off the network as I had multiple computers executing continuous pings to them while the MM loses connectivity. The pings never dropped and stayed very steady. My only guess at this point are the IPSec tunnels. Maybe something became corrupted and they need to be deleted and re-configured?
At this point, I'm pretty beat from staring at this stuff for several days now. End user access really isn't affected since the drop is so quick and there isn't a lot of activity on campus at the moment. However, I'm sure this can develop into a huge issue if not remedied.
Any ideas would be greatly appreciated.