Replicating DataPDF version

Configuring replication with Apache HBase in EMR

You can configure your Cloudera Operational Database experience for data replication with an Amazon EMR cluster with Apache HBase.

  • Ensure that you have the replication plugin. Contact your Cloudera account team to get the replication plugin.
  • Ensure that all the EC2 instances in the EMR cluster can communicate with Cloudera Operational Database. For example, you can configure this by placing the EMR cluster on the same VPC netwoek and subnets used by the Cloudera Operational Database instance.
  • Ensure that your Cloudera Operational Database cluster security group allows inbound TCP connections to ports 16020, 16010, and 2181 from all the EC2 instances in the EMR cluster. You can configure this using the AWS Cloudera Management Console. The port configuration is automatically done if the EMR EC2 instances are configured with the same worker, leader, and controller (also known as master) security groups from Cloudera Operational Database.
  1. Copy the replication plugin to /usr/lib/hbase/lib/ on all EMR cluster RegionServer. (requires root access);
  2. Add the keystore generated on the destination Cloudera Operational Database cluster to HDFS using the following command:
    hdfs dfs -mkdir /hbase-replication
    
    hdfs dfs -put /[***PATH***]/[***credentials.jceks***] /hbase-replication
    
    hdfs dfs -chown -R hbase:hbase /hbase-replication
  3. Edit the EMR /etc/hbase/conf.dist/hbase-site.xml file on each RegionServer host to add the following values:
    <property>
    	  <name>[***hbase.security.replication.credential.provider.path***]</name>
    	  <value>cdprepjceks://hdfs@[***NAMENODE_HOST***]:[***NAMENODE_PORT***]/hbase-replication/credentials.jceks</value>
    	</property>
    	<property>
    	  <name>[***hbase.security.replication.user.name***]</name>
    	  <value>srv_[***WORKLOAD USER NAME***]</value>
    	</property>
  4. Restart the EMR RegionServer by stopping the running processes. RegionServers autostarts and now uses the additional JAR files in the classpath.
  5. Use _ReplicationSetupTool_ to add the peer. _ReplicationSetupTool_ is a command line tool that enables you to create the replication peer and make necessary configuration to be peer specific.
sudo -u hbase hbase org.apache.hadoop.hbase.client.replication.ReplicationSetupTool 
-clusterKey "zk-host-1,zk-host-2,zk-host-3:2181:/hbase" 
-endpointImpl "org.apache.hadoop.hbase.replication.regionserver.CldrHBaseInterClusterReplicationEndpoint" -peerId 1 

We want your opinion

How can we improve this page?

What kind of feedback do you have?