Configuring replication with Apache HBase in EMR
You can configure your Cloudera Operational Database (COD) experience for data replication with an Amazon EMR cluster with Apache HBase.
- Ensure that you have the replication plugin. Contact your Cloudera account team to get the replication plugin.
- Ensure that all the EC2 instances in the EMR cluster can communicate with COD. For example, you can configure this by placing the EMR cluster on the same VPC netwoek and subnets used by the COD instance.
- Ensure that your COD cluster security group allows inbound TCP connections to ports 16020, 16010, and 2181 from all the EC2 instances in the EMR cluster. You can configure this using the AWS management console. The port configuration is automatically done if the EMR EC2 instances are configured with the same worker, leader, and controller (also known as master) security groups from COD.
sudo -u hbase hbase org.apache.hadoop.hbase.client.replication.ReplicationSetupTool
-clusterKey "zk-host-1,zk-host-2,zk-host-3:2181:/hbase"
-endpointImpl "org.apache.hadoop.hbase.replication.regionserver.CldrHBaseInterClusterReplicationEndpoint" -peerId 1