Optimize HBase replication policy performance when replicating HBase tables with several TB data

Can HBase replication policy performance be optimized when replicating HBase tables with several TB of data if the "Perform Initial Snapshot" option is chosen during HBase replication policy creation?

Complete the following manual steps to optimize HBase replication policy performance when replicating several TB of HBase data if you choose the Perform Initial Snapshot option during the HBase replication policy creation process.

Remedy

  1. Before you create the HBase replication policy, perform the following steps:
    1. Navigate to the source Cloudera Manager > YARN service > Configuration tab.
    2. Search for the mapreduce.task.timeout parameter.
    3. Increase the value or set it to 0 to switch off the timeout.
    4. Restart the YARN service.
    5. Navigate to the source Cloudera Manager > HBase service > Configuration tab.
    6. Search and configure the following key-value pairs:
      • hbase.snapshot.master.timeout.millis = 840000
      • hbase.client.sync.wait.timeout.msec = 180000

      • hbase.client.operation.timeout = 2400000

      • hbase.client.procedure.future.get.timeout.msec = 3000000

      • hbase.hfilearchiver.thread.pool.max=100

      • hbase.snapshot.thread.pool.max=24

    7. Restart the HBase service.
    8. Perform steps e through g on the target Cloudera Manager.
  2. When you create the HBase replication policy for the first time using the above configured source cluster, you must increase the Maximum Map Slots value to a higher number on the Advanced Settings page.
  3. If Store File Tracking (SFT) is enabled in the target COD, perform the steps mentioned in the COD migration topic after the replication policy creation is complete.