Configuring replication with Apache HBase in EMR
You can configure your Cloudera Operational Database experience for data replication with an Amazon EMR cluster with Apache HBase.
- Ensure that you have the replication plugin. Contact your Cloudera account team to get the replication plugin.
- Ensure that all the EC2 instances in the EMR cluster can communicate with Cloudera Operational Database. For example, you can configure this by placing the EMR cluster on the same VPC netwoek and subnets used by the Cloudera Operational Database instance.
- Ensure that your Cloudera Operational Database cluster security group allows inbound TCP connections to ports 16020, 16010, and 2181 from all the EC2 instances in the EMR cluster. You can configure this using the AWS Cloudera Management Console. The port configuration is automatically done if the EMR EC2 instances are configured with the same worker, leader, and controller (also known as master) security groups from Cloudera Operational Database.
sudo -u hbase hbase org.apache.hadoop.hbase.client.replication.ReplicationSetupTool -clusterKey "zk-host-1,zk-host-2,zk-host-3:2181:/hbase" -endpointImpl "org.apache.hadoop.hbase.replication.regionserver.CldrHBaseInterClusterReplicationEndpoint" -peerId 1