Better to look at the Interfaces & Port-channel config via CLI on the controllers themselves for mismatched config
I have correct the mismatches via CLI on MM/MC and upgraded again to 8.10.0.9.
After 20 min all APs are back and seems stable now.
Original Message:
Sent: Apr 09, 2024 05:59 AM
From: mniedzwiecki
Subject: APs flapping, heartbeat timeout
Got Similar issue running 8.10.0.8LTR. Looking on my config can you tell me if this is what you guys referring to?
Original Message:
Sent: Apr 04, 2024 09:24 AM
From: dannybosman
Subject: APs flapping, heartbeat timeout
Do you use port-channel ? If so, check the command "trusted vlan"
In our config it was present on the Gi-interfaces , but not on port-channel. No problem in 8.6 , but a lot of issues immediately after migration to 8.10
Add "trusted vlan 1-4094" to the port-channel (reboot needed)
------------------------------
Danny Bosman
KBC Group - Belgium
Original Message:
Sent: Apr 04, 2024 07:45 AM
From: christian.chautems@swisscom.com
Subject: APs flapping, heartbeat timeout
Hello,
Has anybody some update on the root cause as I had the same problem yesterday when upgrading from 8.6.0.22 > 8.10.0.10
Same type of messages:
Apr 3 20:02:03 2024 stm[17903]: <305061> <17903> <WARN> |stm| AP AP-GIVA-1ER-RECEPTION ip 10.94.184.104 outer_ip 0:0:0:0 down , reason: controller detect heart beat timeoutApr 3 20:02:08 2024 stm[17903]: <305061> <17903> <WARN> |stm| AP AP-BULC--EXPO-ETAGE-4 ip 10.94.171.130 outer_ip 0:0:0:0 down , reason: controller role changed
Apr 3 20:02:08 2024 stm[17903]: <305061> <17903> <WARN> |stm| AP RBB-AP-BERN-LAUPEN-A-NEW ip 10.94.95.102 outer_ip 0:0:0:0 down , reason: controller role changed
Apr 3 20:02:08 2024 stm[17903]: <305061> <17903> <WARN> |stm| AP AP-CONA-REZ-SHOP ip 10.94.179.141 outer_ip 0:0:0:0 down , reason: controller role changed
Apr 3 20:02:08 2024 stm[17903]: <305061> <17903> <WARN> |stm| AP RBB-AP-WIIA-D ip 10.94.160.89 outer_ip 0:0:0:0 down , reason: controller role changed
To add it was not during the FW upload to the AP as I had pre-loaded the new FW on all APs before and on migration time just switch version on the controllers
2 x MM
3 x 7210
560 x APs various models (AP-205 / AP-305 / AP-505 / AP-375) all same problem
Kind regards
Christian Chautems
Original Message:
Sent: Oct 16, 2023 09:06 AM
From: cjoseph
Subject: APs flapping, heartbeat timeout
I looked at both of those bugs and they are related to the MM not choosing all of the APs to upgrade during a cluster upgrade. If you choose a traditional upgrade, where you just upgrade the code on the individual MDs and reboot them, instead of doing a cluster upgrade, you should be able to sidestep those bugs.
I would show support your data so that they can figure out what could be going wrong.
------------------------------
Any opinions expressed here are solely my own and not necessarily that of Hewlett Packard Enterprise or Aruba Networks.
HPE Design and Deploy Guides: https://community.arubanetworks.com/support/migrated-knowledge-base?attachments=&communitykey=dcc83c62-1a3a-4dd8-94dc-92968ea6fff1&pageindex=0&pagesize=12&search=&sort=most_recent&viewtype=card
Original Message:
Sent: Oct 16, 2023 08:49 AM
From: dannybosman
Subject: APs flapping, heartbeat timeout
This is a working situation in AOS8.6.0.20 , so routing is ok. Just doing upgrade to 8.10 a lot of failures.
I'm able to simulate this case in our Lab .
Could it be related to AOS-218844 , AOS-222351 (release notes 8.11) ? FYI - also in contact with our external support partner
------------------------------
Danny Bosman
KBC Group - Belgium
Original Message:
Sent: Oct 12, 2023 09:04 AM
From: cjoseph
Subject: APs flapping, heartbeat timeout
After the initial discovery of a cluster, the dhcp options do not matter: the access points will write the ip addresses of the controllers in the cluster into flash and they will no longer do discovery.
Each controller has a single "controller-ip" and regardless of what interface the access point discovers the controller on, the access points will connect to that controller on that ip address. You need to make sure the "controller-ip" is reachable from both subnets, because that is what the access points will use permanently to connect to your controllers. It doesn't matter how many ip interfaces a controller has: there is only one controller-ip...
------------------------------
Any opinions expressed here are solely my own and not necessarily that of Hewlett Packard Enterprise or Aruba Networks.
HPE Design and Deploy Guides: https://community.arubanetworks.com/support/migrated-knowledge-base?attachments=&communitykey=dcc83c62-1a3a-4dd8-94dc-92968ea6fff1&pageindex=0&pagesize=12&search=&sort=most_recent&viewtype=card
Original Message:
Sent: Oct 12, 2023 08:38 AM
From: dannybosman
Subject: APs flapping, heartbeat timeout
Yes, this is controller based (cluster of 2 controller - L2 connected).
Due to growth in number of AP, we have 2 subnets for the AP. After some more investigation, it turns out that all AP on subnet "1" are upgraded, but none on subnet "2". There is a VRRP definition for subnet "1" (used in DHCP options). Local routing between these 2 subnets is provided via cisco switch.
The 2 controllers each have an ipaddr in each subnet ("1" & "2").
AP are not dual connected. Most of AP we use on this location are AP305, but also AP315 & 515 are impacted.
------------------------------
Danny Bosman
KBC Group - Belgium
Original Message:
Sent: Oct 05, 2023 11:43 AM
From: cjoseph
Subject: APs flapping, heartbeat timeout
Is this controller-based? What is the switching infrastructure between the access points and the controller? Are the access points dual connected?
------------------------------
Any opinions expressed here are solely my own and not necessarily that of Hewlett Packard Enterprise or Aruba Networks.
HPE Design and Deploy Guides: https://community.arubanetworks.com/support/migrated-knowledge-base?attachments=&communitykey=dcc83c62-1a3a-4dd8-94dc-92968ea6fff1&pageindex=0&pagesize=12&search=&sort=most_recent&viewtype=card
Original Message:
Sent: Oct 05, 2023 08:45 AM
From: dannybosman
Subject: APs flapping, heartbeat timeout
Hi, yesterday tried to upgrade first cluster to AOS8.10.0.6 , same issue ...
Oct 4 23:32:04 2023 :305061: <17115> <WARN> |stm| AP nwxxxxxx ip 10.xxxxx outer_ip 0:0:0:0 down , reason: controller detect heart beat timeout
The weird thing is that around 30 of 100 AP on this cluster could upgrade, but not the others ..
Anyone got a solution or more feedback from TAC ?
------------------------------
Danny Bosman
KBC Group - Belgium
Original Message:
Sent: Jul 05, 2023 02:51 PM
From: atran004
Subject: APs flapping, heartbeat timeout
I'm also running into the same issue. I've sent a bunch of logs to TAC, but so far haven't been able to find a solution.
They're asking me to set up an SCP server on the network for an AP NSS dump log...but haven't been able to set up an SCP server for just this case. The AP won't dump the file onto the controller and needs the SCP server apparently.
Original Message:
Sent: Apr 29, 2022 07:32 AM
From: alexs-nd
Subject: APs flapping, heartbeat timeout
Hi,
I have a test setup at home runnig Arubaos 8.10
1 * MM VM
1 * 7205 hardwar appliance
2 * AP335s
have a basic setup with APs doing eap-tls auth aganst cppm and that all works just fine.
Problem is the APs are flapping with "AP is down since 2022-04-29 12:00:44 because of the following reason: Hbt Timeout."
messages
All devices are on the same vlan (1) nd same /24 net.
All devices are pingable from other devices
From the MM
(nd7205) [MDC] *#show ap debug client-mgmt-counters
Counters
--------
Name Value
---- -----
Tunnel DACL 10
STM Restart Notification to Auth 1
Action frames: wmm rrm wnm public dls ba vendor unknown 0 0 0 0 0 0 0 0
Associations Dropped Due to Auth Throttling 0
PubSub Messages Rcvd 936
Auth .1x Queue: High, Pending 550, 0
Reg timer calls 8957
BSS publish Failures 0
Tunnel Timeouts 516
Any ideas where i start looking to resolve the hbt-timeout issue ?
Rgds
Alex
------------------------------
Alex Sharaz
------------------------------