Using HashTable and SyncTable tools to copy data between Cloudera Operational Database clusters

You can use the HashTable and SyncTable tools to copy data from one Cloudera Operational Database cluster to another. You can use these tools to synchronize data prior to replication.

You can use the HashTable and SyncTable CLI tools as a one way synchronization method for data in Cloudera Operational Database clusters. The CldrSyncTable job is an extension of the upstream SyncTable tool. For more information about the HashTable and SyncTable tools, see Use HashTable and SyncTable tool.

When you use these tools, ensure that you place the HashTable output directory and the source table at the same location where the CLI exists. This means that you cannot set the sourcezkcluster and the sourcehashdir properties to a remote cluster that the command-line executor cannot authenticate.

Ensure that all RegionServers and DataNodes on the source cluster are accessible by NodeManagers on the target cluster where SyncTable job tasks are running.
In case of secured clusters, users on the target cluster who execute the SyncTable job must be able to do the following on the HDFS and HBase services of the source cluster:
- Authenticate: for example, using the centralized authentication or cross-realm setup.
- Be authorized: having at least read permission.
Ensure that the target table is created and enabled on the target cluster.

Ensure that the following properties have the correct values in the hbase-site.xml configuration file of the target cluster:

<property>
	  <name>hbase.security.replication.credential.provider.path</name>
	  <value>cdprepjceks://hdfs@[***NAMENODE_HOST***]:[***NAMENODE_PORT***]/hbase-replication/credentials.jceks</value>
	</property>
	<property>
	  <name>hbase.security.replication.user.name</name>
	  <value>srv_[***WORKLOAD USER NAME***]</value>
	</property>

Ensure that the source cluster can communicate with the target cluster.
1. Get the ZooKeeper quorum address of the target cluster.
2. Set a subnet that allows connection from the source cluster.
  For example, enabling the port 2181 for the ZooKeeper client.

Run the HashTable job from the source cluster to generate a hash of a table on the source cluster:

hbase org.apache.hadoop.hbase.mapreduce.HashTable 
[***TABLE NAME***] [***HASH OUTPUT PATH***]
hdfs dfs -ls -R [***HASH OUTPUT PATH***]

Run the CldrSyncTable job from the source cluster to compare the generated hashes:

Use the --cldr.cross.domain or the --cldr.unsecure.peer option:

Secure target cluster
Unsecure target cluster

Use the --cldr.cross.domain option when running the CldrSyncTable job in a secure cluster:

hbase org.apache.hadoop.hbase.mapreduce.CldrSyncTable 
--cldr.cross.domain --targetzkcluster=[***TARGET ZOOKEEPER 
QUORUM***]:[***TARGET ZOOKEEPER PORT***]:[***TARGET ZOOKEEPER ROOT FOR HBASE***]
[***HASH OUTPUT PATH***] [***SOURCE TABLE NAME***] 
[***TARGET TABLE NAME ON THE TARGET CLUSTER***]

After the job is completed, check the data copied on the target cluster.
```
scan '[***TARGET TABLE NAME***]'
```

Using HashTable and SyncTable tools to copy data between Cloudera Operational Database clusters

We want your opinion

How can we improve this page?