09-25-2014 12:26 PM
Has anybody deployed AP-103 ?
We are facing performance issues on it and also do we have any HA issues on 126.96.36.199 code version as the ap's randomly move over to the local controller.
09-25-2014 07:23 PM
You may be hitting an issue I currently have in with TAC/engineering, which they tried to fix in 188.8.131.52 but the fix didn't quite do the trick in production.
I'll let TAC know someone else may be seeing this issue. In the meantime, we are running with inter-controller heartbeats disabled. This means HA is not as fast, but the AP-based HA is still working (they still build preemptive tunnels) -- we have found that the APs themselves only fail over when there is a real problem, but the inter-controller heartbeat is tripping for some reason.
To turn of heartbeats go to the redundancy menu, and in the groups containing the affected controllers uncheck the "heartbeat" checkbox and apply and save configuration. Or from the CLI go into the "ha group-profile" and execute "no heartbeat" then do a "wr mem".
I'd be interested to know what gear you have between your controllers.
09-25-2014 11:13 PM
Good to know..we have temporarily disabled HA and we have enabled LMS and BLMS redundancy which is traditional way of failing over the AP's.
We have two controllers connected over the MPLS link and they are connected to the Core network on a Port channel.
We have checked the latency and there is no latency whatsoever and also the ports have no error frames.
09-26-2014 05:02 AM
OK let's do a bit of verification that you have the same issue.
Please go to each controller and check "show ha hearbeat counters" and see if any were missed
while you had the feature turned on. The stats should still be there as long as you have not reloaded.
On one of the APs, do a "show ap debug system-status ap-name XXX | begin "HA Failover Information"
See if you have a line like this there matching the same time as the APs moved:
2014-09-24 15:10:03 Failover request from standby: fail-over to 10.5.5.81
Do you have entries in the syslog from about that that time like this and also at other
sbHeartbeat: PAPI RxPacketFromSibyte: ACK to invalid packet SN = 0x0000a36e opcode=0x6
We also have to rule out that you had actual packet loss. I know you check the latency, but
did you also check for queueing drops?
09-26-2014 06:15 AM
Unfortunately i am not able to see the hear-beat counters between the controllers
HA Failover Information
Date Time Reason (Latest 10)
2014-09-25 16:45:13 Failover request from standby: fail-over to 10.224.32.30
2014-09-25 16:50:16 Pre-emptive failover back to LMS 10.223.32.30
2014-09-25 17:26:16 Failover request from standby: fail-over to 10.224.32.30
2014-09-25 17:31:20 Pre-emptive failover back to LMS 10.223.32.30
Please find the logs taken from one of the AP's
09-26-2014 06:33 AM
OK, if at some point in the future you turn HA back on to test, it might be best
if you disabled the "preemption" checkbox so you don't get two failovers for
every actual failover. This will result in the APs remaining on the standby
unless/until there is another event.
09-26-2014 07:00 AM
It is hard to tell for sure because it could also be caused by real packet loss, but I don't see anything that would rule it out so far based on what I've seen.
We have a lot of 103H's. The HA bug is not model-specific, it affects all models.