Replace a ZooKeeper role without ZooKeeper service downtime
If server to server SASL authentication is not enabled, you can replace a ZooKeeper role without ZooKeeper service downtime.
This process is valid only if the SASL authentication is not enabled between the ZooKeeper servers. You can check this in Cloudera Manager, if you see the Enable Server to Server SASL Authentication under the Configuration of ZooKeeper.
- Stop the ZooKeeper role on the old host.
- Confirm the ZooKeeper service has elected one of the remaining hosts as a leader on the service Status page.
- On the ZooKeeper Instances page, stop the ZooKeeper role on the old host then Delete the role instance
- Add the new ZooKeeper role instance on the new host, this new instance appears in the list in Down state.
- Rolling restart all the other unmigrated and running ZookeeperServer instances.
- Start the newly added ZookeeperServer instance.
- Confirm that a leader has been elected and the whole Zookeeper service is in green state.
- Using the cluster Action menu perform a Deploy Client Configuration and Refresh.
- Restart any dependent services such as HBase, HDFS, YARN, Hive, and other services that are marked to have stale configuration. If any service is deployed in highly available and/or load balancing configuration it is adviseable to perform a rolling restart the role instance to avoid the service outage. The only exception is the HBase service because rolling restarting the RegionServer roles generates region unassignment/assignemnt storms which significantly lengthen the service restart time and can trigger region assignment related failures, such as RITs, that requires manual intervention. Hence, for the HBase service we recommend the clean service stop and start. This is the only service which the ZookeeperServer instance migration results in shorter or longer outage depending on the number of the regions and the RegionServers.
-