Few hours into the start of the day and everything is stable with 10.4.1.2. The errors seen with 10.5+ have not returned
Still having a lot of issues with apple devices in area's with poorer wifi coverage but that was the same on 10.4.1.1 and probably a different issue.
Original Message:
Sent: Jun 13, 2024 06:23 AM
From: dankingdon
Subject: Aruba Central Controllerless Environment Is Not Working
Going to try 10.4.1.2 overnight and will report back findings.
Original Message:
Sent: Jun 06, 2024 09:00 AM
From: Mflowers@beta.team
Subject: Aruba Central Controllerless Environment Is Not Working
Just an FYI but 10.4.1.2 is out now. I upgraded to it over a week ago and have not seen any issue in our environment.
I upgraded because I have had a ticket open since January about role to role policy not working in a HA controller cluster. If I had a policy that stated:
src.user: GUEST dest.user: EMPLOYEE action: DENY
src.user: GUEST dest: any action:allow
The above policy would only work if the users were on the same controllers. If the users were on two different controllers the deny policy would not work. The controllers would lose the user information if traffic would have to be passed between HA members. I have been running with one controller disconnected from the network to ensure role to role policy works.
If you do not have controllers/gateways then this does not apply to you.
AOS-250024

Original Message:
Sent: Jun 03, 2024 03:48 PM
From: dankingdon
Subject: Aruba Central Controllerless Environment Is Not Working
Adding my experience onto this discussion. Same issues as OP running 10.6.0.0. TAC being absolutely useless after 3 weeks of "troubleshooting"
No ClearPass for us so that makes things simpler for troubleshooting.
Will try downgrading to 10.4.1.1 and report back.
Original Message:
Sent: Apr 26, 2024 08:45 PM
From: ariyap
Subject: Aruba Central Controllerless Environment Is Not Working
I am glad that it worked out for you and shared the outcome here. since then firmware 10.5.1.1 was released.
However the release notes did not mention "AOS-247757", so i don't know that bug ID goes by a different ID number or was the culprit in your scenario.
------------------------------
If my post was useful accept solution and/or give kudos.
Any opinions expressed here are solely my own and not necessarily that of HPE or Aruba.
Original Message:
Sent: Apr 26, 2024 08:52 AM
From: mmurphy
Subject: Aruba Central Controllerless Environment Is Not Working
It looks like we fixed it!
I connected with user mflowers above and his suggestions which included:
- Downgrade to 10.4.1.1
- Change the VLAN assignment rules
- Reinstall RADIUS Cert on Clearpass
appear to have fixed the issue. I had looked at some of these issues with TAC and they said everything looked good. I'm glad that we have a community like this where multiple heads can come together to look down an issue.
Thanks!
Original Message:
Sent: Apr 22, 2024 09:31 AM
From: mmurphy
Subject: Aruba Central Controllerless Environment Is Not Working
I'm starting to come to the realization that there is no fix for these and Aruba Central is a faulty product. Maybe an Aruba employee can help me with the following issue since TAC doesn't seem capable.
As stated before by another poster on this thread clients are getting DHCP timeouts where Aruba Central identifies the client as the DHCP server. They also get timeouts when the DHCP server is identified as the DHCP server. Is that something in the way Central is set up? Can that be fixed?


Users are 802.11 de-associated from the network all day long. This happens to me when I just sit in my office without moving. I then move to a different location or wait 5-10 mins and it reconnects. Why? Isn't the whole point of being connected is to stay connected? Why would the system boot me off. Before you say maybe there are too many clients, this happens in my office where I might have as many as 3 people connected to the same AP. TAC had also asked me to add a second AP to a location to see if this might help alleviate this issue. It did not.
TAC took some PCAPS from Clearpass to look at the Timeouts we are experiencing there. That was on Friday so I haven't heard anything back on that. I hope something can be found there.
I am open to looking at every aspect of this configuration to see what might be wrong. I didn't program the switches, or APs alone. I worked with HPE engineers and HPE certified technicians to design and build this. I am hoping I can find someone who can help with this but it is becoming harder and harder to do so. It's almost as if this entire system was designed to fail. As of right now it is too cost-prohibitive to look at another network company, but that might be the next step.
Original Message:
Sent: Apr 17, 2024 12:53 PM
From: mmurphy
Subject: Aruba Central Controllerless Environment Is Not Working
I had left my computer sitting connected in one room and I left to meet with someone and then came back and found it disconnected. I checked the log and found that the computer is listed as the DHCP server as you said. Right after it a high retry rate. I did spot other of the same DHCP error sprinkled throughout the event logs.
When it did reconnect it was after a bunch of Disassociation from Client warnings and took a few minutes. I also had to leave the room and it reconnected. When the client does acknowledge DHCP it does so with the right DHCP server. It is important to note that this computer didn't roam it was in the same room when all this happened. When I left the room it was to a small office and reconnected to the same AP that disconnected it.
The majority of devices we have connecting are iPads so they are seeing the most issues. Windows machines are connecting through EAP-TLS but were still having roaming issues, where the user may have to forget the network to reconnect.
As far as EAP timeouts go. When I see it in the log on Central I will normally see it in Clearpass. Sometimes it isn't in Clearpass which leads the idea that the requests are getting dropped somewhere.
When TAC did pcaps of the APs last week the found that the clients were sending multiple requests to Clearpass and those requests were either getting dropped or rejected. The client then floods Clearpass with requests.
This still leads me to believe that something else is up with Clearpass. I didn't have this issue before in 6.10 but I think the issue is happening more in 6.11. Knowing that I had to redo the config when I moved to 6.11 I assume something got programmed wrong, even though it didn't change.
Original Message:
Sent: Apr 17, 2024 10:41 AM
From: Mflowers@beta.team
Subject: Aruba Central Controllerless Environment Is Not Working
"This is positive progress more than I have seen so far. I still think there is something up with Clearpass, I just haven't figured that out yet."
If you want some help troubleshooting clearpass - let me know. Clearpass can be configured in so many different ways that I can't really give to much help without looking at your CPPM server.
"The error I do see is the AP asking for PMK cache and it isn't there. That is normally rectified within a minute when the user connects. I assume as the day progresses that PMK issue will go away as the users connect to more APs around the building."
I agree with you and also there can still sometimes issues where the PMK cache doesn't exist for the client (pretty normal in my option).
If something with the PMK cache is missing/deleted/lost, then you will get this. The PMK cache by default on windows is longer than the PMK cache for the APs. Windows has the cache at 12 hours and Aruba has it defaulted to 8 hours. Not sure if that is related to what you are seeing but that is something that could cause this to happen. Its not an issue and the client/AP will understand what is happening and the roam/connection will just take 1-2 seconds instead of <100ms. In AOS10.5.x this is a major issue because the AP/Client do not figure out the PMK cache is bad and the client doesn't know to do a full 4-way handshake. It was a while when I looked at the pcaps but in 10.5 i believe it wasn't sending a deauth packet to the client correctly and the client kept trying to reconnect with its cached PMK. This seems resolved in 10.4.1 and the client will understand the PMK was bad.
"I am still seeing high retry rates with devices already connected, EAP timeouts, and Client DHCP timeout."
Not sure on the EAP timeouts you are seeing. I would have to see more details about that.
The high retry rates could be a lot of things. Could be poor wifi coverage/clients being dumb/config issue/driver issue/etc.
Here is an issue I had after downgrading - Sticky clients/Un-steerable clients:
Since you had issues with roaming for so long, I am sure your Un-Steerable client list is very large due to the clients failing to roam correctly for so often. I dont know of a way to view the client list without using the REST API, so if you want to do that you will need to call /unsteerable/v1/{tenant_id}
I dont fully know how that list works (when clients are removed) but I cleared out a ton of devices from that when I enabled 802.11r and downgraded to 10.4.1.0. This helped fix issues I was seeing with clients being on APs far away and bad SNR/retry rates/etc. Don't know if that is what you might be seeing or not.
The DHCP timeouts you are seeing might be related to the stupidity that is Aruba Central and crappy coding.
There could be a few different reasons for the DHCP timeouts:
1. Actual issues with network/DHCP server.
2. Client fails to fully connect to the AP and is rejected. Since the client is rejected, the DHCP DISCOVER (not request) from the client is dropped.
3. 802.11K failing and Aruba Central showing that as a DHCP timeout.
Take a look at your "Client DHCP Timeout" log - do you see the DHCP Server IP as correct? If so then 1/2 might be the issue. I see client DHCP timeout often due to clients getting rejected pretty often and every time I look into it, it is because someone is trying to connect to the wrong SSID (guest trying to connect to the employee network).
(Here is the one I see the most):
If you see the "DHCP Server" be the same as the hostname of the client, then this is 802.11k failing and Aruba Central being stupid.

Here is the packet capture - as you can see there is no DHCP request that is being sent:

Here DHCP timeout that actually shows to our DHCP server. When I look at the ClearPass logs I can see that the client is getting rejected:

Here is the packet capture:

^BTW - any Aruba (HPE) engineers out there - this is what DHCP looks like. If you google for "DHCP wiki" you will find a good reference on how DHCP works.
Original Message:
Sent: Apr 17, 2024 09:06 AM
From: Matt Murphy
Subject: Aruba Central Controllerless Environment Is Not Working
Upon your advice I moved some of the more troublemaking APs to 10.4.1.1 and did a few walk throughs to test. They did a good job of allowing me to roam, but once I roamed to an AP with 10.5.x on it I had the same issues. Last night I moved all the APs to 10.4.1.1 and am monitoring it this morning. So far I do not see many PMK errors, which is great. The error I do see is the AP asking for PMK cache and it isn't there. That is normally rectified within a minute when the user connects. I assume as the day progresses that PMK issue will go away as the users connect to more APs around the building.
I am still seeing high retry rates with devices already connected, EAP timeouts, and Client DHCP timeout. The last two seem to hold up connection for up to a couple of minutes.
This is positive progress more than I have seen so far. I still think there is something up with Clearpass, I just haven't figured that out yet.
Original Message:
Sent: Apr 15, 2024 01:55 PM
From: Mflowers@beta.team
Subject: Aruba Central Controllerless Environment Is Not Working
Quick note about downgrading to 10.4.1.1 from 10.5.x:
Make sure you test this on one AP first. 10.5.x added the ability to set the management VLAN of the AP. In 10.4.x the VLAN/Trunk port of the AP will send management traffic as untagged.
If you are using the setting under AP GROUP -> CONFIG -> SYSTEM -> VLAN -> Customize Management VLAN:
1. Update the switch ports for your APs so that the Native/Untagged VLAN is your MGMT VLAN
2. Set the auto commit for your AP group to OFF
3. Remove the management VLAN settings for the AP that you will test with
4. Make sure your AP wired uplink profile is updated correctly with your Trunk/VLAN settings
5. Apply the settings to the AP you are testing with
6. Downgrade your firmware from 10.5.x to 10.4.1.1
If you do not remove the settings for the management VLAN before downgrading it will cause your AP to lose connectivity to the network and you will have to manually reset the AP to factory defaults.
This happened to me with an AP that I tested the downgrade from 10.5.x. Make sure you test this on one AP first before you apply to all your APs.
Original Message:
Sent: Apr 15, 2024 12:34 PM
From: mmurphy
Subject: Aruba Central Controllerless Environment Is Not Working
The last packet capture TAC took showed that users packets were getting lost on their way from the AP and to Clearpass. This got me thinking that something is in between Clearpass and the AP where packets are not getting through when users are trying to verify through RADIUS.
What you're describing in your environment is also what I am seeing, so now I am interested in downgrading.
Original Message:
Sent: Apr 15, 2024 12:05 PM
From: Mflowers@beta.team
Subject: Aruba Central Controllerless Environment Is Not Working
"Would this in anyway affect the connection between the APs and Clearpass?"
-No, it would not
"Clearpass VM is not allowing enough traffic to reach Clearpass, which is resulting in packets getting dropped and connections getting lost when clients attach to a new AP."
-Its a PMK cache issue. If you look at your logs I am sure you are going to see a ton of PMK-R1/PMK-R0 issues in your logs (something like that, don't remember 100%).
The client tries to roam to a new AP. The PMK cache is incorrect on the new AP and the client fails to connect. Instead of actually telling the client it needs to do a new full 4-way handshake, the client is stuck in a state where it is kind associated but not really. This causes the AP to detect that there is no DHCP traffic from the client and disconnects the client from the AP. It can not send DHCP traffic because the association failed.
Original Message:
Sent: Apr 15, 2024 11:55 AM
From: Matt Murphy
Subject: Aruba Central Controllerless Environment Is Not Working
This is interesting. I didn't even think about downgrading. We started at 10.5 so maybe it's been an issue all along. Would this in anyway affect the connection between the APs and Clearpass?
I'm starting to think the issue might be in how our Clearpass VM is not allowing enough traffic to reach Clearpass, which is resulting in packets getting dropped and connections getting lost when clients attach to a new AP.
Original Message:
Sent: Apr 15, 2024 11:17 AM
From: Mflowers@beta.team
Subject: Aruba Central Controllerless Environment Is Not Working
Downgrade from 10.5.x to 10.4.1.1. We had tons of roaming issues on 10.5 and downgrading to 10.4.1.0 fixed 99% of them. We had an issue where sometimes users would fail to roam with "Association Flood Detected" if they roamed too often in a short period of time - this was resolved when we upgraded to 10.4.1.1.
If you want/need help with your Aruba environment, let me know. I have been working with Aruba wireless since 2012 and currently manage/architected our Aruba setup with AOS-CX switches/655 APs/9240 controllers/Clearpass - All AOS10/Central.
Original Message:
Sent: Apr 12, 2024 09:01 AM
From: mmurphy
Subject: Aruba Central Controllerless Environment Is Not Working
After a month of technical support TAC has seemed to narrow down the issue.
It appears that users are connecting but not able to communicate with Clearpass. Multiple packets are being sent to Clearpass but are either getting lost or rejected before Clearpass allows a connection. The only real change to Clearpass in the past few months was moving to 6.11, which was a real pain. Though the configuration didn't change. I also am not sure if Central is set up properly to communicate with Clearpass. Currently users are not getting the correct IP addresses when they connect. I am waiting for one of four TAC tickets I have open to set up a time to meet with me.
I feel the best thing to do would be to redo all the configurations, but I don't have the level of expertise to do that right. It's also really hard to find a qualified Aruba tech out in the world to help me. So I will wait on TAC and see what happens.
Original Message:
Sent: Mar 12, 2024 11:30 AM
From: mmurphy
Subject: Aruba Central Controllerless Environment Is Not Working
I've had Aruba Central running my network since November and I have had nothing but trouble with it. My environment is a school with up to 1300 active daily users. There are 140 Access Points.
Currently I have had issues with:
Roaming.
Users getting rejected by the access point when connecting.
When some users get connected they get speeds as slow as .5mbps. Sometimes it speeds up, sometimes it doesn't.
Users get kicked off for no reason whatsoever, just while sitting at their desk.
If more than 20 users try to connect at one time most or all of them will not get connected. This of course doesn't work when students enter a new room every 45 mins.
We have Clearpass and there are rejections in Clearpass but those may be related to roaming according to TAC. I've tried different APs but it doesn't seem to make a difference.
The one thing I am unsure of would be our switches. Last summer we installed 17 new Aruba/HPE 2930 switches on top of the 8 2930's we installed a couple years ago. I don't think I am seeing issues in the areas of the 8 older switches, at least no one is complaining about issues in those parts of the building. Could there be a configuration on the switch that is causing problems with the APs? I am not sure.
I have had our sales engineer, multiple outside groups, and numerous tickets with TAC open and no one can seem to find a problem. Basically the network has become unusable and I am unsure what to do next. I can go back to the controllers but don't I have to move to AOS10 eventually? Shouldn't there be a way to make this all work?
Any help would be greatly appreciated.