Problem:
Unable to start region servers on cdh nodes
Ambari Cluster management console can be accessed via port 7183( https://<Analyzer IP>:7183)
From the UI, we can notice that region servers on compute nodes(cdh-1/cdh-2/cdh-3/....) are stopped.
Diagnostics:One of the reason for this behavior is due to time difference between the compute nodes and Analyzer.
This can be confirmed by looking at the hbase logs under /var/log/hbase/
In the below example, while attempting to start the region servers on compute nodes cdh-2 and cdh-3, notice the below error in hbase logs available under /var/log/hbase of cdh-2 and cdh-3 respectively.
To access hbase logs of cdh-2 node from Analyzer cli, we can execute below commands:
#ssh cdh-2
#tail -f /var/log/hbase/hbase-hbase-regionserver-cdh-2.cluster-internal.log
As we can see, the region servers fail to start because the time difference is greater than 30 seconds with the master (Analyzer).
To check the time in all nodes in cluster, we can use the below ansible command:
# ansible all -m shell -a date
As we can see from above, cdh-2 and cdh-3 nodes time is out of sync with an-node(Analyzer/master) by more than 30 seconds.
SolutionBy default, below NTP Servers are configured in Analyzer(Menu->Configuration->System->NTP Servers) to which all nodes in cluster would be synced.
We can execute below command to sync all nodes with NTP server:
ansible all -m shell -a "service ntpd restart"
If above NTP servers are not reachable from cdh nodes, we can manually configure date-time using below format:
date -s "10 JUL 2012 17:57:00"
For example, to change date-time on cdh-2, ssh to cdh-2 and then change date-time to sync with Analyzer.
#ssh cdh-2
#date -s "10 JUL 2012 17:57:00"