Replicating data between Data Hub clusters with SRM deployed in a Data Hub
cluster.
You can set up and configure an instance of SRM in a Data Hub cluster to replicate
data between Data Hub clusters. In addition, you can use SMM to monitor the replication process.
Review the following example to learn how this can be set up.
Consider the following replication scenario:
In this scenario, data is replicated between two Data Hub clusters that are provisioned in
different CDP environments. More specifically, data in Data Hub East is replicated to Data
Hub West by an instance of SRM running in Data Hub West.
Both Data Hub clusters are provisioned with the default Streams Messaging cluster
definitions.
SRM and SMM are available in both clusters, but the instances in Data Hub East are not
utilized in this scenario.
This example scenario does not go into detail on how to
set up the clusters and assumes the following:
Two Data Hub clusters provisioned with the Streams Messaging Light Duty or Heavy Duty
cluster definition are available.
Network connectivity and DNS resolution are
established between the clusters.
Create a machine user for SRM in Management Console:
A machine user is
required so that SRM has credentials that it can use to connect to the Kafka service in
the Data Hub cluster. This step is only required in the environment where SRM is not
running. In the case of this example, this is the CDP Public Cloud East
environment.
Navigate to Management Console
> User
Management.
Click Actions > Create
Machine User.
Enter a unique name for the user and click Create.
For example: srm
After the user is created, you are presented with a page that displays the
user details.
Click Set Workload Password.
Type a password in the Password and Confirm
Password fields. Leave the Environment field
blank.
Click Set Workload Password.
A
message appears on successful password creation.
Grant the machine user access to your environment:
You must to grant the
machine user access in your environments, otherwise SRM will not be able to connect to the
Kafka service with this user. This step is only required in the environments where SRM is
not running. In the case of this example this is the CDP Public Cloud East
environment.
Navigate to
Management Console >
Environments, and select the environment where your
Kafka cluster is located.
Click Actions > Manage
Access.
Use the search box to find and
select the machine user you want to use.
A list of
Resource Roles appears.
Select the EnvironmentUser role and click
Update Roles.
Go back to the Environment Details page and click Actions > Synchronize Users to FreeIPA.
On the Synchronize Users page, click Synchronize
Users.
Synchronizing users ensures that the role assignment is in effect for the environment.
Add Ranger permissions for the user you created for SRM.
This step is only required in the environment where SRM is not running. In the case of
this example the environment is the CDP Public Cloud East .
Navigate to
Management Console >
Environments, and select the environment where your
Kafka cluster is located.
Click the Ranger link on the Environment Details page.
Select the resource-based service corresponding to the Kafka resource in
the Data Hub cluster.
Add the Workload User Name of the user you created
for SRM to the following Ranger policies:
All - consumergroup
All - topic
All - transactionalid
All - cluster
All - delegationtoken
Establish trust between the clusters:
A truststore is needed so that the
SRM instance running in Data Hub West can trust Data Hub East. To do this, you extract the
FreeIPA certificate from Environment East, create a truststore that includes the
certificate, and copy the truststore to all hosts on Data Hub West.
Navigate to Management Console
> Environments, and select
Environment East.
Go to the Summary
tab.
Scroll down to the FreeIPA
section.
Click
Actions > Get FreeIPA
Certificate.
The FreeIPA certificate
file, [***ENVIRONMENT NAME***]-env.crt, is
downloaded to your computer.
Run the following command to create the truststore:
#Bootstrap servers:
dheast.bootstrap.servers=[***MY-DATAHUB-EAST-CLUSTER-HOST-1.COM:9093***],[***MY-DATAHUB-EAST-CLUSTER-HOST-2:9093***]
dhwest.bootstrap.servers=[***MY-DATAHUB-WEST-CLUSTER-HOST-1.COM:9093***],[***MY-DATAHUB-WEST-CLUSTER-HOST-1.COM:9093***]
Replications:
dheast->dhwest.enabled=true
#Datahub East cluster’s security properties:
dheast.security.protocol=SASL_SSL
dheast.sasl.mechanism=PLAIN
dheast.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="[***WORKLOAD USER NAME***]" password="[***MACHINE USER PASSWORD***]";
dheast.ssl.truststore.location=/opt/cloudera/security/truststore-east.jks
dheast.ssl.truststore.password=[***PASSWORD***]
#Use the FQDN when specifying cluster hosts.
#The terminating semicolon in the [***ALIAS***].sasl.jaas.config property must be included in the configuration.
#The value of the [***ALIAS***].ssl.truststore.location is the location where you copied the truststore in a previous step.
#The [***ALIAS***].ssl.truststore.password property must be specified. Otherwise, the configuration might get overriden by the service ssl.truststore.password property.
Click Save.
Restart SRM.
Deploy client configuration for SRM.
Restart SMM.
Start data replication topics using the srm-controltool:
SSH as an administrator to any of the SRM hosts in the Data Hub West
cluster.
Create a configuration file for the srm-control tool.
The srm-control tool behaves as a Kafka
client and requires configuration that is similar to any Kafka client. The configuration
file is specified with the --config option when you run the tool. The
configuration file must include cluster alias definitions, as well as properties related
to connection information and security. Cluster aliases are defined a single time,
connection and security properties are defined separately for each alias (cluster). In
this example the file is named srm.properties.
Define aliases:
clusters=dheast, dhwest
#Bootstrap servers
dheast.bootstrap.servers=[***MY-DATAHUB-EAST-CLUSTER-HOST-1.COM:9093***],[***MY-DATAHUB-EAST-CLUSTER-HOST-2:9093***]
dhwest.bootstrap.servers=[***MY-DATAHUB-WEST-CLUSTER-HOST-1.COM:9093***],[***MY-DATAHUB-WEST-CLUSTER-HOST-1.COM:9093***]
#Datahub East cluster’s security properties:
dheast.security.protocol=SASL_SSL
dheast.sasl.mechanism=PLAIN
dheast.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="[***WORKLOAD USER NAME***]" password="[***MACHINE USER PASSWORD***]";
dheast.ssl.truststore.location=/opt/cloudera/security/truststore-east.jks
dheast.ssl.truststore.password=[***PASSWORD***]
#Datahub West cluster’s security properties:
dhwest.security.protocol=SASL_SSL
dhwest.sasl.mechanism=GSSAPI
dhwest.sasl.kerberos.service.name=kafka
dhwest.sasl.jaas.config=com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true keyTab="[***PATH TO KEYTAB FILE***]" storeKey=true useTicketCache=false principal="[***MY KERBEROS PRINCIPAL****]";
dhwest.ssl.truststore.location=/opt/cloudera/security/truststore-west.jks
dhwest.ssl.truststore.password=[***PASSWORD***]
#Use the FQDN when specifying the cluster hosts.
#The terminating semicolon in the [***ALIAS***].sasl.jaas.config properties must be included in the configuration.
#The value of the dheast.ssl.truststore.location property is the location where you copied the truststore in a previous step.
Use the srm-control tool with the topics
subcommand to add topics to the allow list: