How does ClearPass handle a 'Split-Brain' failure?

Here is the current network configuraiton


2 DCs connected via a WAN.


each DC had 10 CPPM 25k nodes. DC1 had a publisher, DC2 has a standby publisher. Each DC has 9 Subscribers which server RADIUS traffic.


In the event that the WAN link drops betwene DCs, the DC2 standby publisher will promote itself to active publisher, and will try to contact all nodes to bring them under its control. However since WAN is down, it can only talk to subscribers in DC2, and DC1 subscribers will remain connected to DC1.


When the WAN link comes back online, DC1 publisher will see that DC2 has taken over, and will go into a 'cleanup state' and stop all it services. This causes the subscibers in DC1 to lose their publisher, and form what i see inthe lab, they do not call back to DC2 Publisher to get managed. So we end up with a bunch of orphaned nodes. Is there an issue in our config, or is this expected behaviour?


Just a small side question. During an authentication we add attributes to endpoints in the endpoint DB. If the primary publisher is down, and clients authenticate to a subscirber, will these endpoint updates be published when the publisher comes back online, or is this data lost?





Re: How does ClearPass handle a 'Split-Brain' failure?

I'd recommend having a read of the clustering technote. Split brain should be avoided by having pub and standby pub on the same L2 broadcast domain.


There are warning regarding this.






Re: How does ClearPass handle a 'Split-Brain' failure?

Technically this is Layer 2, but it is with a Layer 2 extension between sites. So the Layer 2 subnet can still be split. I guess thats why we didn't see the warning while configuring the settings. I had looked in the tech note, but i guess i missed that, and when i did a find for 'split' nothing came up, as its in the screen shot.


I get the concern, the customer had a requirement for multi site full redundancy. I guess maybe i should propose that they do Standby publisher in the same DC, and if that DC goes down and they are in a bind, they can then promote one of the Subscribers in DC2 manually to publisher.


How does the cluster handle if a subscriber is promoted while both of the other 2 publishers are offline/not contactable. Will they see that a 3rd device was promoted, and both put themselves in a 'waiting' state?


Thanks for the info,



