Replicating data from CDP PvC Base cluster to Data Hub cluster with SRM deployed in
Data Hub cluster
You can set up and configure an instance of SRM running in a Data Hub cluster to
replicate data between the Data Hub cluster and a CDP PvC Base cluster. In addition, you can use
SMM to monitor the replication process. Review the following example to learn how this can be
set up.
Consider the following replication scenario:
In this scenario, data is replicated from a CDP PvC Base cluster to a Data Hub cluster by
an SRM instance that is deployed in the Data Hub cluster.
The CDP PvC Base cluster has Kafka deployed on it. It is a secure cluster that has TLS/SSL
encryption enabled and uses PLAIN authentication. In addition, it uses Ranger for
authorization.
The Data Hub cluster is provisioned with the one of the default Streams Messaging cluster
definitions.
This example scenario does not go into detail on how to
set up the clusters and assumes the following:
A Data Hub cluster provisioned with the
Streams Messaging Light Duty or Heavy Duty cluster definition is available.
A CDP PvC Base cluster with Kafka is available. This cluster has TLS/SSL encryption
enabled, uses PLAIN authentication, and has Ranger for authorization. For more
information, see the CDP Private Cloud Base Installation Guide.
Network connectivity and DNS resolution are
established between the clusters.
Obtain PLAIN credentials for SRM.
The credentials of a PLAIN user that can access the CDP PvC Base cluster are required.
These credentials are supplied to SRM in a later step. In this example [***PLAIN
USER***] and [***PLAIN USER PASSWORD***] is used to refer
to these credentials.
Add Ranger permissions for the PLAIN user in the CDP PvC cluster:
You must ensure that the PLAIN user you obtained has correct permissions assigned to it
in Ranger. Otherwise, SRM will not be able to access Kafka resources on the CDP PvC Base
cluster.
Access the Cloudera Manager instance of your
CDP PvC Base cluster.
Go to
Ranger > Ranger Admin Web
UI.
Log in to the Ranger Console (Ranger Admin
Web UI).
Add the [***PLAIN USER***] to the following
policies:
All - consumergroup
All - topic
All - transactionalid
All - cluster
All - delegationtoken
Acquire the CDP PvC Base cluster truststore and add it to the Data Hub
cluster:
The actions you need to take differ depending on how TLS is set up in
the CDP PvC Base cluster:
Obtain the certificate of the Cloudera Manager root Certificate Authority and its
password.
The Certificate Authority certificate and its password can be obtained
using the Cloudera Manager API. The following steps describe how you can retrieve
the certificate and password using the Cloudera Manager API
Explorer. Alternatively, you can also retrieve the certificate and
password by calling the appropriate endpoints in your browser window or using
curl.
Access the Cloudera Manager instance of your CDP PvC Base cluster.
Go to Support > API
Explorer.
Find CertManagerResource.
Select the /certs/truststore GET operation and click
Try it out.
Enter the truststore type.
Click Execute.
Click Download file under
Responses.
The downloaded file is your
certificate.
Select the /certs/truststorePassword GET operation and
click Try it out.
Click Execute.
The password is displayed under
Responses.
Run the following command to create the
truststore:
keytool \
-importcert \
-storetype JKS \
-noprompt \
-keystore cdppvc-truststore.jks \
-storepass ***PASSWORD*** \
-alias cdppvc-cm-ca \
-file ***PATH TO CM CA CERTIFICATE***
Note
down the password, it is needed in a later step.
Copy the cdpdc-truststore.jks file to a common location on all
the hosts in your CDP Data Hub cluster.
Cloudera recommends that you use the
following location:
/opt/cloudera/security/cdppvc-truststore.jks.
Set the correct file permissions.
Use 751 for the directory and 444 for the
truststore file.
Note down the CDP PvC Base cluster's truststore location and password, these
should be known to you.
Copy the truststore file to a common location on all the hosts in your CDP Data
Hub cluster.
Cloudera recommends that you use the following location:
/opt/cloudera/security/truststore.jks.
Set the correct file permissions.
Use 751 for the directory and 444 for the
truststore file.
Configure the SRM properties in the Data Hub cluster:
Access the Cloudera Manager instance of your Data Hub cluster.
Go to Streams Replication Manager > Configuration and configure the following properties:
#Bootstrap servers:
cdppvc.bootstrap.servers=[***MY-CDP-PVC-CLUSTER-HOST-1.COM:9093***],[***MY-CDP-PVC-CLUSTER-HOST-2:9093***]
datahub.bootstrap.servers=[***MY-DATAHUB-CLUSTER-HOST-1.COM:9093***],[***MY-DATAHUB-CLUSTER-HOST-2.COM:9093***]
#Replications:
cdppvc->datahub.enabled=true
#Security properties for the CDP PvC Base cluster:
cdppvc.security.protocol=SASL_SSL
cdppvc.sasl.mechanism=PLAIN
cdppvc.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="[***PLAIN USER***]" password="[***PLAIN USER PASSWORD***]";
cdppvc.ssl.truststore.location=/opt/cloudera/security/cdppvc-truststore.jks
cdppvc.ssl.truststore.password=***PASSWORD***
#Use the FQDN when specifying cluster hosts.
#The terminating semicolon in the [***ALIAS***].sasl.jaas.config property must be included in the configuration.
#The value of the [***ALIAS***].ssl.truststore.location is the location where you copied the truststore in a previous step.
#The [***ALIAS***].ssl.truststore.password property must be specified. Otherwise, the configuration might get overriden by the service ssl.truststore.password property.
Click Save.
Restart SRM.
Deploy client configuration for SRM.
Start data replication topics using the srm-control tool:
SSH as an administrator to any of the SRM hosts in the Data Hub
cluster.
ssh [***USER***]@[***MY-DATAHUB-CLUSTER.COM***]
Create a configuration file for the srm-control tool.
The srm-control tool behaves as a Kafka
client and requires configuration that is similar to any Kafka client. The configuration
file is specified with the --config option when you run the tool. The
configuration file must include cluster alias definitions, as well as properties related
to connection information and security. Cluster aliases are defined a single time,
connection and security properties are defined separately for each alias (cluster). In
this example the file is named srm.properties.
#Bootstrap servers:
cdppvc.bootstrap.servers=[***MY-CDP-PVC-CLUSTER-HOST-1.COM:9093***],[***MY-CDP-PVC-CLUSTER-HOST-2:9093***]
datahub.bootstrap.servers=[***MY-DATAHUB-CLUSTER-HOST-1.COM:9093***],[***MY-DATAHUB-CLUSTER-HOST-1.COM:9093***]
#CDP PVC Base cluster’s security properties:
cdppvc.security.protocol=SASL_SSL
cdppvc.sasl.mechanism=PLAIN
cdppvc.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="[***PLAIN USER***]" password="[***PLAIN USER PASSWORD***]";
cdppvc.ssl.truststore.location=/opt/cloudera/security/cdppvc-truststore.jks
cdppvc.ssl.truststore.password=[***PASSWORD***]
#Data Hub cluster's security properties:
datahub.security.protocol=SASL_SSL
datahub.sasl.mechanism=GSSAPI
datahub.sasl.kerberos.service.name=kafka
datahub.sasl.jaas.config=com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true keyTab="[***PATH TO KEYTAB FILE***]" storeKey=true useTicketCache=false principal="[***MY KERBEROS PRINCIPAL****]";
datahub.ssl.truststore.location=/opt/cloudera/security/datahub-truststore.jks
datahub.ssl.truststore.password=[***PASSWORD***]
#Use the FQDN when specifying the cluster hosts.
#The terminating semicolon in the [***ALIAS***].sasl.jaas.config properties must be included in the configuration.
#The value of the cdpdc.ssl.truststore.location property is the location where you copied the truststore in a previous step.
Use the srm-control tool with the topics
subcommand to add topics to the allow list: