We are noticing a issue very similar to this when we moved from 6.1 to 6.3. Our AP's are flapping between controllers on a regular basis. We did not notice this issue on 6.1.
We enabled BCMC optimization and the issue is still happening.
We also noticed several of these error messages in the logs when it happens:
Sep 10 06:00:38 :311020: <ERRS> |AP xxxxxx sapd| An internal system error has occurred at file sapd_redun.c function set_route_af line 650 error set_route_af: ioctl (SIOCDELRT) failed: No such process.
We are also seeing these crashes in the logs on each controller frequently:
Sep 10 00:26:23 :303080: <ERRS> |nanny| Please tar and email the file crash.tar to support@arubanetworks.com
Sep 10 00:26:23 :303081: <ERRS> |nanny| To tar type the following commands at the Command Line Interface: (1) tar crash (2) copy flash: crash.tar tftp: [serverip] [destn filename]
Sep 10 00:26:34 :303073: <ERRS> |nanny| Process /mswitch/bin/stm [pid 26321] died: got signal SIGABRT
Sep 10 00:26:38 :303029: <ERRS> |nanny| Process /mswitch/bin/stm [pid 26321]: crash data saved in dir /flash/crash/process/9-10-2013@00-26-34/stm
Sep 10 00:26:44 :303079: <ERRS> |nanny| Restarted process /mswitch/bin/stm, new pid 26702
Sep 10 00:26:44 :303025: <ERRS> |nanny| Found core file /tmp/core.26321.stm.A6xxx_39170, 63582208 bytes, compressing...
Sep 10 00:28:14 :303080: <ERRS> |nanny| Please tar and email the file crash.tar to support@arubanetworks.com
Sep 10 00:28:14 :303081: <ERRS> |nanny| To tar type the following commands at the Command Line Interface: (1) tar crash (2) copy flash: crash.tar tftp: [serverip] [destn filename]
Sep 10 03:51:54 :303086: <ERRS> |AP xxxxx nanny| Process Manager (nanny) shutting down - AP will reboot!
Sep 10 05:51:12 :303073: <ERRS> |nanny| Process /mswitch/bin/stm [pid 10344] died: got signal SIGABRT
Sep 10 05:51:16 :303029: <ERRS> |nanny| Process /mswitch/bin/stm [pid 10344]: crash data saved in dir /flash/crash/process/9-10-2013@05-51-12/stm
Sep 10 05:51:22 :303079: <ERRS> |nanny| Restarted process /mswitch/bin/stm, new pid 22076
Sep 10 05:51:22 :303025: <ERRS> |nanny| Found core file /tmp/core.10344.stm.A6xxx_39170, 88985600 bytes, compressing...
Sep 10 05:51:49 :311020: <ERRS> |AP xxxx sapd| An internal system error has occurred at file sapd_redun.c function redun_tunnel_up line 4992 error redun_tunnel_up: client not found port:8423.
Sep 10 05:55:14 :303080: <ERRS> |nanny| Please tar and email the file crash.tar to support@arubanetworks.com
Sep 10 05:55:14 :303081: <ERRS> |nanny| To tar type the following commands at the Command Line Interface: (1) tar crash (2) copy flash: crash.tar tftp: [serverip] [destn filename]
We have a case open with TAC and as of yet they do not have a solution. They tried to say our link was going down, but we have constant pings to controllers and the switche's in between and none of them have ping loss during the time it happens. The ap's don't have any loss unless they decide to reboot which also seems to happen frequently since moving to 6.3.
I have attached graphs of our AP movements.