Replicating data between cloud clusters with Streams Replication Manager in the cloud
You can set up and configure an instance of Streams Replication Manager in a CDP Data Hub cluster to replicate data between CDP Data Hub clusters. In addition, you can use Streams Messaging Manager to monitor the replication process.
Consider the following replication scenario.
In this scenario, data is replicated between two CDP Data Hub clusters that are provisioned in different CDP Public Cloud environments. More specifically, data in CDP Data Hub East is replicated to CDP Data Hub West by an instance of Streams Replication Manager running in CDP Data Hub West.
Both CDP Data Hub clusters are provisioned with the default Streams Messaging cluster definitions.
Streams Replication Manager and Streams Messaging Manager are available in both clusters, but the instances in CDP Data Hub East are not utilized in this scenario.
This example scenario does not go into detail on how to set up the clusters and assumes the following.
-
Two Data Hub clusters provisioned with the Streams Messaging Light Duty or Heavy Duty cluster definition are available.
For more information, see Setting up your Streams Messaging cluster in the CDF for Data Hub library. Alternatively, you can also review the cloud provider specific cluster creation instructions available in the Cloudera Data Hub library.
- Network connectivity and DNS resolution are established between the clusters.
-
Create a machine user for Streams Replication Manager in Cloudera Management Console.
A machine user is required so that Streams Replication Manager has credentials that it can use to connect to the Kafka service in the CDP Data Hub cluster. This step is only required in the environment where Streams Replication Manager is not running. In the case of this example, this is the CDP Public Cloud East environment.
- Navigate to Management Console > User Management.
- Click Actions > Create Machine User.
-
Enter a unique name for the user and click Create.
For example:
srm
After the user is created, you are presented with a page that displays the user details. - Click Set Workload Password.
- Type a password in the Password and Confirm Password fields. Leave the Environment field blank.
-
Click Set Workload Password.
A message appears on successful password creation.
-
Grant the machine user access to your environment.
You must to grant the machine user access in your environments, otherwise Streams Replication Manager will not be able to connect to the Kafka service with this user. This step is only required in the environments where Streams Replication Manager is not running. In the case of this example this is the CDP Public Cloud East environment.
- Navigate to Management Console > Environments, and select the environment where your Kafka cluster is located.
-
Click Actions > Manage Access.
Use the search box to find and select the machine user you want to use.A list of Resource Roles appears.
- Select the EnvironmentUser role and click Update Roles.
- Go back to the Environment Details page and click Actions > Synchronize Users to FreeIPA.
-
On the Synchronize Users page, click Synchronize
Users.
Synchronizing users ensures that the role assignment is in effect for the environment.
-
Add Ranger permissions for the user you created for Streams Replication Manager.
This step is only required in the environment where Streams Replication Manager is not running. In the case of this example, this is the CDP Public Cloud East environment.
- Navigate to Management Console > Environments, and select the environment where your Kafka cluster is located.
- Click the Ranger link on the Environment Details page.
- Select the resource-based service corresponding to the Kafka resource in the CDP Data Hub cluster.
-
Add the Workload User Name of the user you created for Streams Replication Manager to the following Ranger policies.
- All - consumergroup
- All - topic
- All - transactionalid
- All - cluster
- All - delegationtoken
-
Establish trust between the clusters.
A truststore is needed so that the Streams Replication Manager instance running in CDP Data Hub West can trust CDP Data Hub East. To do this, you extract the FreeIPA certificate from the CDP Public Cloud East environment, create a truststore that includes the certificate, and copy the truststore to all hosts on CDP Data Hub West.
- Navigate to Management Console > Environments, and select CDP Public Cloud East.
- Go to the FreeIPA tab.
-
Click Get FreeIPA Certificate.
The FreeIPA certificate file,
[***ENVIRONMENT NAME***].crt
, is downloaded to your computer. -
Run the following command to create the truststore.
keytool \ -importcert \ -storetype JKS \ -noprompt \ -keystore truststore-east.jks \ -storepass [***PASSWORD***] \ -alias freeipa-east-ca \ -file [***PATH TO FREEIPA CERTIFICATE***]
-
Copy the
truststore-east.jks
file to a common location on all the hosts in your CDP Data Hub West cluster.Cloudera recommends that you use the following location:/opt/cloudera/security/truststore-east.jks
. -
Set the correct file permissions.
Use 751 for the directory and 444 for the truststore file.
- Access the Cloudera Manager instance of the CDP Data Hub West cluster.
-
Define the external Kafka cluster (the CDP Data Hub East
cluster).
- Go to Administration > External Accounts.
-
Go to the Kafka Credentials tab.
On this tab you will create a credential for each external cluster taking part in the replication process.
- Click Add Kafka credentials
-
Configure the Kafka credentials.
In the case of this example, you must create a single credential representing the CDP Data Hub East cluster. For example:
Name=dheast Bootstrap servers=[***MY-CLOUDERA-DATA-HUB-EAST-CLUSTER-HOST-1.COM:9093***],[***MY-CLOUDERA-DATA-HUB-EAST-CLUSTER-HOST-2:9093***] Security Protocol=SASL_SSL JAAS Secret 1=[***WORKLOAD USER NAME***] JAAS Secret 2=[***MACHINE USER PASSWORD***] JAAS Template=org.apache.kafka.common.security.plain.PlainLoginModule required username="##JAAS_SECRET_1##" password="##JAAS_SECRET_2##"; SASL Mechanism=PLAIN Truststore Password=[***PASSWORD***] Truststore Path=/opt/cloudera/security/truststore-east.jks Truststore type=JKS
-
Click Add.
If credential creation is successful, a new entry corresponding to the Kafka credential you specified appears on the page.
-
Define the co-located Kafka cluster (the CDP Data Hub West
cluster).
- In Cloudera Manager, go to Clusters and select the Streams Replication Manager service.
- Go to Configuration.
- Find and enable the Kafka Service property.
-
Find and configure the Streams Replication Manager Co-located Kafka
Cluster Alias property.
The alias you configure represents the co-located cluster. Enter an alias that is unique and easily identifiable. For example:
dhwest
-
Enable relevant security feature toggles.
Because the CDP Data Hub cluster is both TLS/SSL and Kerberos enabled, you must enable all feature toggles for both the Streams Replication Manager Driver and Service roles. The feature toggles are the following.
- Enable TLS/SSL for SRM Driver
- Enable TLS/SSL for SRM Service
- Enable Kerberos Authentication
-
Add both clusters to the configuration of Streams Replication Manager.
-
Find and configure the External Kafka Accounts
property.
Add the name of all Kafka credentials you created to this property. This can be done by clicking the add button to add a new line to the property and then entering the name of the Kafka credential. For example:
dheast
-
Find and configure the Streams Replication Manager Cluster
alias property.
Add all cluster aliases to this property. This includes the aliases present in both the External Kafka Accounts and Streams Replication Manager Co-located Kafka Cluster Alias properties. Delimit the aliases with commas. For example:
dheast,dhwest
-
Find and configure the External Kafka Accounts
property.
-
Configure replications.
In this example data is replicated unidirectionally. As a result, only a single replication must be configured.
- Find the Streams Replication Manager's Replication Configs property.
- Click the add button and add new lines for each unique replication you want to add and enable.
-
Add and enable your replications. For example:
dheast->dhwest.enabled=true
-
Configure Streams Replication Manager Driver and Service role targets.
-
Find and configure the Streams Replication Manager Service Target
Cluster property.
Add the co-located cluster's alias to the property. For example:
dhwest
-
Find and configure the Streams Replication Manager Driver Target
Cluster property.
For example:
dheast,dhwest
-
Find and configure the Streams Replication Manager Service Target
Cluster property.
-
Configure the
srm-control
tool.- Click Gateway in the Filters pane.
-
Find and configure the following properties.
- SRM Client's Secure Storage Password: [***PASSWORD***]
- Environment Variable Holding SRM Client's Secure Storage Password: SECURESTOREPASS
- Gateway TLS/SSL Trust Store File: /opt/cloudera/security/truststore-west.jks
- Gateway TLS/SSL Truststore Password: [***PASSWORD***]
- SRM Client's Kerberos Principal Name: [***MY KERBEROS PRINCIPAL****]
- SRM Client's Kerberos Keytab Location: [***PATH TO KEYTAB FILE***]
- Click Save Changes.
- Restart the Streams Replication Manager service.
- Deploy client configuration for Streams Replication Manager.
-
Start the replication process using the
srm-control
tool.-
SSH as an administrator to any of the Streams Replication Manager hosts in
the CDP Data Hub West cluster.
ssh [***USER***]@[***MY-CLOUDERA-DATA-HUB-WEST-CLUSTER-HOST-1.COM***]
-
Set the secure storage password as an environment variable.
export [***SECURE STORAGE ENV VAR***]=”[***SECURE STORAGE PASSWORD***]”
Replace [***SECURE STORAGE ENV VAR***] with the name of the environment variable you specified in Environment Variable Holding SRM Client's Secure Storage Password. Replace [***SECURE STORAGE PASSWORD***] with the password you specified in SRM Client's Secure Storage Password. For example:export SECURESTOREPASS=”mypassword"
-
Use the
srm-control
tool with thetopics
subcommand to add topics to the allow list.srm-control topics --source dheast --target dhwest --add [***TOPIC NAME***]
-
Use the
srm-control
tool with thegroups
subcommand to add groups to the allow list.srm-control groups --source dheast --target dhwest --add ".*"
-
SSH as an administrator to any of the Streams Replication Manager hosts in
the CDP Data Hub West cluster.
-
Monitor the replication process.
Access the Streams Messaging Manager UI in the CDP Data Hub West cluster and go to the Cluster Replications page. The replications you set up will be visible on this page.