Using the CldrCopyTable utility to copy data

You can use the CldrCopyTable utility to copy data from one Cloudera Operational Database (COD) cluster to another. You can use it to bring data in sync for replication.

CldrCopyTable is Cloudera’s version of the upstream CopyTable utility. For more information about the CopyTable utility, see Use CopyTable.

The --cldr.cross.domain option of CldrCopyTable is an extension of the COD replication plugin that enables you to copy data cross-realm. The replication plugin uses PAM authentication to validate the machine user credentials. COD clusters are always provisioned with PAM authentication against the CDP environment FreeIPA security domain. Cloudera Single Sign on (CSSO) is used as the machine user credentials (for example, csso_user).

  1. Ensure that the following properties have the correct values in the hbase-site.xml configuration file of the target cluster:
    <property>
    <name>hbase.security.replication.credential.provider.path</name>
    <value>cdprepjceks://hdfs@[***NAMENODE_HOST***]:[***NAMENODE_PORT***]/hbase-replication/credentials.jceks</value>
    </property>
    
    <property>
    <name>hbase.security.replication.user.name</name>
    <value>srv_[***WORKLOAD USER NAME***]</value>
    </property>
    
  2. Ensure that the sourse cluster can communicate with the target cluster:
    1. Get the ZooKeeper quorum address of the target cluster.
    2. Set the address as an environment parameter in the source cluster.
    3. Set a subnet that allows connection from the source cluster.
      For exmple by enabling the port 2181 for the ZooKeeper client.
  3. Issue the CldrCopyTable command from the source cluster to write to the target cluster.
    Based on your target cluster setup you have to use either the --cldr.cross.domain or the --cldr.unsecure.peer option.
    Use the --cldr.cross.domain option:
    hbase org.apache.hadoop.hbase.mapreduce.CldrCopyTable --cldr.cross.domain --peer.adr=[***ZOOKEEPER QUORUM***]:[***ZOOKEEPER PORT***]:[***ZOOKEEPER ROOT FOR HBASE***] --new.name="[***NEW TABLE NAME***]" "[***SOURCE TABLE NAME***]"
    Use the --cldr.unsecure.peer option:
    hbase.org.apache.hadoop.hbase.mapreduce.CldrCopyTable --cldr.unsecure.peer --peer.adr=[***ZOOKEEPER QUORUM***]:[***ZOOKEEPER PORT***]:[***ZOOKEEPER ROOT FOR HBASE***] --new.name="[***NEW TABLE NAME***]" "[***SOURCE TABLE NAME***]"
  4. Once the job is finished, check the target cluster and ensure that the copy was successful.