Security

 View Only
last person joined: yesterday 

Enterprise security using ClearPass Policy Management, ClearPass Security Exchange, IntroSpect, VIA, 360 Security Exchange, Extensions and Policy Enforcement Firewall (PEF).
Expand all | Collapse all

CPPM down >24 hours: RADIUS state?

This thread has been viewed 24 times
  • 1.  CPPM down >24 hours: RADIUS state?

    Posted Dec 05, 2022 09:39 AM
    If a node of a CPPM cluster is down for >24 hours, when it comes back it has to be manually re-introduced to the cluster and in the meantime is marked as down and sync is disabled.

    What happens to the RADIUS requests while it is in this state?  The RADIUS clients won't know any different so will continue to send requests to it.  My assumption is that the RADIUS server will be held down as well because it is assuming the database is incorrect, but this isn't mentioned in the documentation so can anyone confirm whether RADIUS is disabled until the node is re-joined to the cluster and synced?


  • 2.  RE: CPPM down >24 hours: RADIUS state?

    EMPLOYEE
    Posted Dec 05, 2022 11:42 AM
    For the publisher node, it's obvious that it will just serve requests.
    For a subscriber node, when the synchronization is lost, it should just continue to operate with the known constraints of a missing publisher like no updates to the endpoint database (profiling, MDM, etc), no new guests, no changes to guest accounts, and no configuration updates. When you rejoin the appliance to the publisher, during the time that that procedure takes, the RADIUS server will be stopped and services are unavailable.

    You may double-check with Aruba Support to get this confirmed if you need a full definitive answer.

    ------------------------------
    Herman Robers
    ------------------------
    If you have urgent issues, always contact your Aruba partner, distributor, or Aruba TAC Support. Check https://www.arubanetworks.com/support-services/contact-support/ for how to contact Aruba TAC. Any opinions expressed here are solely my own and not necessarily that of Hewlett Packard Enterprise or Aruba Networks.

    In case your problem is solved, please invest the time to post a follow-up with the information on how you solved it. Others can benefit from that.
    ------------------------------



  • 3.  RE: CPPM down >24 hours: RADIUS state?

    Posted Dec 06, 2022 04:42 AM
    Thank you Herman, that's what I hoped/expected - the RADIUS should stop as the database is in an unknown state.


  • 4.  RE: CPPM down >24 hours: RADIUS state?

    Posted Dec 06, 2022 08:50 AM
    But that's not what @Herman Robers said.  The Subscriber will continue to serve RADIUS requests as normal for the time it is disconnected from the Publisher.  The only time it will stop is when it is re-joined successfully to the deployment.  ​


  • 5.  RE: CPPM down >24 hours: RADIUS state?

    Posted Dec 06, 2022 09:12 AM
    Thanks @ahollifield , in which case that statement is ambiguous then.  I read "that procedure" to mean the restoration of it to a synchronised state in the cluster, but I can see it could also mean just the short period while it is synchronised, though that seems counter-intuitive.

    I then checked out the Document - ClearPass CPPM Tech Note Clustering Design Guidelines v1.2 | HPE Support which says on page 25: "If a subscriber node goes down, authentication requests, guest access, and Onboard access will failto this node, probably with a timeout error displayed to the client."  That is also ambiguous because what do they mean by "down"?  Shut down, or marked "down" in the cluster display?

    It seems wrong to me that it would serve RADIUS requests based on a database it knows is out of date, but all I am after is the right answer, whatever that may be.





  • 6.  RE: CPPM down >24 hours: RADIUS state?
    Best Answer

    EMPLOYEE
    Posted Dec 06, 2022 10:05 AM
    With that procedure, I meant the manual drop and re-join. RADIUS should continue, until you manually decide to drop and rejoin the subscriber. Only while joining the publisher, the subscriber will stop RADIUS and start again when the sync has completed.

    About the statement, how I read it that if a subscriber goes down (like reboot, power outage, making it unavailable) authentications to that node (obviously) will timeout. When the database is out of sync (Cluster status Failed), the node is not down. It may look like from the publisher, but seen from the switches/APs it's still up.

    I would say that in most cases it's better to continue on a database that you are not 100% certain that it is up to date, than just stop processing. If you would continue on that thought, you may even consider that RADIUS should stop if you have lost connectivity for even 1 hour, or 1 minute, where one of the purposes of having subscribers is to survive the loss of nodes in your cluster, or loss of connectivity.

    ------------------------------
    Herman Robers
    ------------------------
    If you have urgent issues, always contact your Aruba partner, distributor, or Aruba TAC Support. Check https://www.arubanetworks.com/support-services/contact-support/ for how to contact Aruba TAC. Any opinions expressed here are solely my own and not necessarily that of Hewlett Packard Enterprise or Aruba Networks.

    In case your problem is solved, please invest the time to post a follow-up with the information on how you solved it. Others can benefit from that.
    ------------------------------



  • 7.  RE: CPPM down >24 hours: RADIUS state?

    Posted Dec 06, 2022 12:04 PM
    Thanks Herman,

    If that's the case then I will update my notes to reflect this.

    It seems that the problem is the desired behaviour will be different for each installation.  In my case the Publisher and Subscriber are both RADIUS servers in a load balanced group, so if the subscriber goes down for a while and changes are made to just the publisher, then we would not want the failed machine to come back and start serving RADIUS as we will then get different responses randomly from the server group.

    My aim was to document the behaviour so we know what to do in such a case, so your input is very helpful and I will note that we must manage the process of bringing the failed machine back into service and keep its data interface down until it is back in the cluster and synchronised.

    Many thanks to both of you for your help.