It is very difficult to analyze a problem afterwards if no logs are available. One can only guess what can be and what not.
It is normal that only the primary controller is displayed for an AP in status down. So you can see where the AP was connected when it was in status up.
So in cluster operation the AP stores the IP addresses of both cluster members in the nodelist and tries to connect to its primary controller after a reboot. If the primary controller is not reachable, the AP tries to connect to the secondary controller. If the secondary controller is also not reachable, the AP tries to connect to the aruba master that it finds via ADP - in your case via DNS. If it cannot connect to any controller, the AP boots.
In your case the AP could not connect to any controller and ended up in a boot loop. Can you eliminate wired network as the cause of the error? Is it possible that the AP did not reach the controllers on the transport layer? Possibly because of a firewall or routing problem?
So the controller controllers definitely do not need to be rebooted on schedule.
For the configuration, contact a local Aruba partner, maybe your configuration needs to be redisigned.
------------------------------
Regards,
Waldemar
ACCX # 1377, ACEP, ACA - Network Security
If you find my answer useful, consider giving kudos and/or mark as solution
------------------------------
Original Message:
Sent: Mar 03, 2023 10:52 AM
From: bkw1227
Subject: DHCP or Controllers
I have 17 AOS clustered controllers at 8.6. code.
We were experiencing a situation where some access points would not get back on the network after a power outage at a some schools. The AP's would boot up, grab an IP, resolve the master DNS , reboot grab a different IP. We spent hours troubleshooting the DHCP. I later to a hard look at the cli output from show AP database long and noticed that the AP's that were down didn't have a backup IP. At that point took a look at the cluster of controllers. I noticed that one of the 17 controllers had 0 AP's and 0 users. Called support and the controller was rebooted. All of the down AP's returned to an up state. There was no crash info, and I was asked to download logs on the rebooted controller.
Can some one help me understand why with 17 controllers clustered the AP's could not get back on the network?
Is it the configuration?
A bug in the software?
Do the controllers need to rebooted on schedule?
Thanks