If you are increasing the number of database connections, you probably have a very high load on your system.
What you describe is expected in an publisher-subscriber, where the client is authenticated on the subscriber. Updates are always ran through the publisher, then synchronized back to the subscribers, which introduces a delay. Normally 5-10 seconds replication delay would be considered acceptable. You should check if updates are really not performed, or that the replication delay is (sometimes) higher than 8 seconds. Increasing the CoA delay (if you are using CoA), or the Login Delay (if you are using controller initiated) and see if issues disappear or become less frequent may be worth trying. If you found the issue, please post the resolution here to help others.
------------------------------
Herman Robers
------------------------
If you have urgent issues, always contact your Aruba partner, distributor, or Aruba TAC Support. Check
https://www.arubanetworks.com/support-services/contact-support/ for how to contact Aruba TAC. Any opinions expressed here are solely my own and not necessarily that of Hewlett Packard Enterprise or Aruba Networks.
In case your problem is solved, please invest the time to post a follow-up with the information on how you solved it. Others can benefit from that.
------------------------------
Original Message:
Sent: May 22, 2023 09:39 AM
From: stubbyroot
Subject: Clearpass failing to write updated MAC cache values to endpoint (intermittent)
Hello guys,
I have a bit of a doozy here that I'm trying to solve. This is for a guest network. Pertinent details below:
- Guest wireless system using captive portal + MAC caching
- Server-initiated setup
- Interfaces with Cisco WLCs
- Clients initiate with L2 MAC authentication. This service looks to see if Authorization:[Time Source]:Now LESS_THAN %{Endpoint:MAC-Address-Expiration}). If this is is true, it sends back an allow access. If not, it sends back a captive portal enforcement policy that returns url-redirect and url-redirect-acl
- Portal accept button triggers a WEBAUTH that processes the request and updates two Endpoint values. Allow-Guest-Internet (boolean) and MAC-Address-Expiration (epoch)
- Allow-Guest-Internet set to TRUE
- MAC-Address-Expiration set to %{Authorization:[Time Source]:Now Plus 1day} (not a custom Time Source filter)
- Client is set a CoA (below)
- The client reauthenticates via L2 with the export anchor still active on the WLC anchor side
- At this point, the client should satisfy Authorization:[Time Source]:Now LESS_THAN %{Endpoint:MAC-Address-Expiration}) so it will be given an allow access profile.
- Login delay on page is set to 8 seconds
- CoA delay on clearpass servers is set to 8 seconds
The problem:
Intermittently, clearpass is FAILING to write the output values of the WEBAUTH request into the endpoint database so the subsequent MAC auth triggers the portal again instead of an allow access. Clients on iPhones will get the hotspot login page error. Other devices will loop around the portal again. It depends on the device.
Visual evidence:
Here is a client exhibiting the problem.
MAC Auth #1 Client has a previous value that is now expired of 1684682572. Time Source Now = 1684758610
WEBAUTH Request Output. New expiration value = 1684848708
MAC Auth #2. Shows MAC-Auth-Expiration value of 1684682572 which was the ORIGINAL value, not the updated one.
Client then went through the portal again, and on the second go around, it updated.
I'm at a loss here and can't understand why the DB is failing to write these values intermittently. System stats are not not showing anything taxed, however I do see events in the event viewer showing long running queries so that's evidence to suggest the disk is not keeping up with these random I/O reads and writes. Another thing I'm looking at is our maximum number of DB connections is set to the default of 400. Would increasing this to 700 (or more) per Airheads Community be helpful?
Any thoughts? I would be extremely grateful for any help with this. I do have a TAC case open but I wanted to get y'alls opnion on this as well.
Thanks,
Max Turpin