Enable bulk load replication using Cloudera Manager

You can enable bulk load replication using Cloudera Manager.

If you have enabled Kerberos cross-realm authentication:

  1. At the command line, use the list_principals command to list the kdc, admin_server, and default_domain for each realm.
  2. Add this information to each cluster using Cloudera Manager. For each cluster, go to HDFS > Configuration > Trusted Kerberos Realms. Add the target and source realms. This requires a restart of HDFS.
  3. Enter a Reason for change, and then click Save Changes to commit the changes. Restart the role and service when Cloudera Manager prompts you to restart.
  1. Go to the source cluster from which you want to replicate the buck loaded data.
  2. Go to HBase > Configuration.
  3. Select Scope > (Service-Wide).
  4. Locate the HBase Service Advanced Configuration Snippet (Safety Valve) for hbase-site.xml property or search for it by typing its name in the Search box.
  5. Add the following property values:
    • Name: hbase.replication.bulkload.enabled

      Value: true

      Description: Enable bulk load replication

    • Name: hbase.replication.cluster.id

      Value: source1

      Description: Provide a source cluster-ID. For example, source1.

  6. Manually copy and paste the source cluster’s HBase client configuration files in the target cluster where you want the data to be replicated. Copy core-site.xml, hdfs-site.xml, and hbase-site.xml to the target cluster. Do this for all RegionServers.
    The settings files should in the /run/cloudera-scm-agent/process/ directory by default, in the master HBase service directory. You can also search for the configuration:
    find ./ -name 'hbase-site.xml' -print0 | xargs -0 grep 'hbase.replication.bulkload.enabled'
  7. Go to the target cluster where you want the data to be replicated.
  8. Go to HBase > Configuration.
  9. Select Scope > (Service-Wide).
  10. Locate the HBase Service Advanced Configuration Snippet (Safety Valve) for hbase-site.xml property or search for it by typing its name in the Search box.
  11. Add the following property value:
    • Name: hbase.replication.conf.dir

      Value: /opt/cloudera/fs_conf

      Description: Path to the configuration directory where the source cluster’s configuration files have been copied. The path for copying the configuration file is [hbase.replication.conf.dir]/[hbase.replication.cluster.id], i.e.:/opt/cloudera/fs_conf/source/

  12. Restart the HBase service on both clusters to deploy the new configurations.
  13. Add the peer to the source cluster as you would with normal replication.
    • In the HBase shell, add the target cluster as a peer using the following command:
      add_peer '1', CLUSTER_KEY => '<cluster_name>:<hbase_port>:/hbase'
    • Enable the replication for the table to which you will be bulk loading data using the command:
      enable_table_replication 'IntegrationTestBulkLoad'
    • Alternatively, you can allow replication on a column family using the command:
      disable ‘IntegrationTestBulkLoad’
      alter 'IntegrationTestBulkLoad', {NAME => ‘D’, REPLICATION_SCOPE => '1'}
      enable ‘IntegrationTestBulkLoad’
      

    You can verify if BulkLoad Replication is working in your set up by following the example in this blog post:

    https://blog.cloudera.com/blog/2013/09/how-to-use-hbase-bulk-loading-and-why/