Question : I have a CPPM cluster with one publisher and one subscriber. Once the Publisher went down, the Subscriber did not get promoted as Publisher and went out of Sync.
Environment Information : This applies to CPPM versions 6.2 and above.
Symptoms : Details about the configuration of cluster.
1: Publisher's IP : 10.30.156.119/24
2: Subscriber's IP : 10.30.156.129/24
3: VIP IP : 10.30.156.160.
Cluster was working perfectly until the Publisher went down and below is the message we see when we login to the VIP .
This is caused because of a missing Cluster Wide Parameter configuration.
We would have to enable the failover feature and add the subscriber's IP as a designated Publisher.
Login to CPPM and navigate to "Administration » Server Manager » Server Configuration". Click on Cluster Wide parameters and browse to "Standby Publisher" Tab.
The Designated Standby Publisher must be Selected from the drop down.
Once the above discussed fields are set, the Subscriber would get promoted as a Publisher if the original publisher goes down.
I logged in with the VIP and we can see that the Subscriber is promoted as Publisher when original Publisher went down.
Now once the Original Publisher comes online, it would not automatically join the Cluster because the Subscriber has taken over the Publisher.
We will have to manually join it to the Cluster. We would see the below capture if we login to the Original Publisher.
This is an expected behavior of CPPM. We can avoid this by setting the failover time as high as 1 hour.
If we make this change and the Publisher goes down, the below will happen.
1: Original publisher goes down
2: Subscriber takes over the VIP in 10 seconds ( configurable as shown below)
3: All the authentication requests are handled by the VIP which is held by the Subscriber node.
4: If the Publisher does not comes online in 60 minutes, the Designated Publisher ( Subscriber node in this case) gets promoted as Publisher.
Below is an explanation which would make this more clear.
-VIP stays with the configured Publisher node until it fails.
-Upon failure of the configured Publisher , the configured subscriber takes over the VIP. It does a gratuitous ARP to update ARP caches and emits system events to indicate the takeover.
-When the Publisher is back on line It does a gratuitous ARP to update ARP caches and emits system events to indicate the takeover.
The only exception to this is when we have publisher redundancy configured. In this case, if the Publisher is down for long enough ( 60 minutes is the maximum value which we can set), the configured designated Publisher promotes itself as publisher. The original Publisher is dropped from cluster during this promotion. Since the original Publisher is now out of cluster, any VIPs for which it was configured as the Publisher are NOT released back to it even if we bring up the Publisher machine again. The VIP service on the original Publisher will be stopped and it will refuse to start if you try to start it manually.
To get the original Publisher back into the cluster, we have to reset its DB and join it back to the cluster. After this joining, you can manually start the VIP service on this node, and it will take back ownership of any VIPs for which it was configured as Publisher.