Replicating data from on premises to cloud with Streams Replication Manager in
the cloud
You can set up and configure an instance of Streams Replication Manager
running in a CDP Data Hub cluster to replicate data between the CDP Data Hub cluster and a Cloudera Private Cloud Base Base
cluster. In addition, you can use Streams Messaging Manager to monitor the replication
process.
Consider the following replication scenario.
In this scenario, data is replicated from a Cloudera Private Cloud Base
cluster to a CDP Data Hub cluster by a Streams Replication Manager instance that is deployed in the CDP Data Hub cluster.
The Cloudera Private Cloud Base cluster has Kafka deployed on it. It is a
secure cluster that has TLS/SSL encryption enabled and uses PLAIN authentication. In
addition, it uses Ranger for authorization.
The CDP Data Hub cluster is provisioned with the one of the default
Streams Messaging cluster definitions.
This example scenario does not go into detail on how to
set up the clusters and assumes the following.
A CDP Data Hub
provisioned with the Streams Messaging Light Duty or Heavy Duty cluster definition is
available.
A Cloudera Private Cloud Base cluster with Kafka is available. This
cluster has TLS/SSL encryption enabled, uses PLAIN authentication, and has Ranger for
authorization. For more information, see the CDP Private Cloud Base Installation Guide.
Network connectivity and DNS resolution are
established between the clusters.
Obtain PLAIN credentials for Streams Replication Manager.
The credentials of a PLAIN user that can access the Cloudera Private Cloud Base cluster are required. These credentials are
supplied to Streams Replication Manager in a later step. In this example
[***PLAIN USER***] and [***PLAIN USER PASSWORD***]
is used to refer to these credentials.
Add Ranger permissions for the PLAIN user in the Cloudera Private Cloud Base cluster.
You must ensure that the PLAIN user you obtained has correct permissions assigned to it
in Ranger. Otherwise, Streams Replication Manager will not be able to access
Kafka resources on the Cloudera Private Cloud Base cluster.
Access the Cloudera Manager
instance of your Cloudera Private Cloud Base cluster.
Go to Ranger > Ranger Admin Web UI.
Log in to the Ranger Console (Ranger Admin Web UI).
Add the [***PLAIN USER***] to the following policies.
All - consumergroup
All - topic
All - transactionalid
All - cluster
All - delegationtoken
Acquire the Cloudera Private Cloud Base cluster truststore and add it
to the CDP Data Hub cluster.
The actions you need to take differ depending on how TLS is set up in the Cloudera Private Cloud Base cluster.
Obtain the certificate of the Cloudera Manager root Certificate
Authority and its password.
The Certificate Authority certificate and its password
can be obtained using the Cloudera Manager API. The following
steps describe how you can retrieve the certificate and password using the Cloudera ManagerAPI Explorer. Alternatively, you can also retrieve the
certificate and password by calling the appropriate endpoints in your browser
window or using curl.
Access the Cloudera Manager instance of your Cloudera Private Cloud Base cluster.
Go to Support > API Explorer.
Find CertManagerResource.
Select the /certs/truststore GET operation and click
Try it out.
Enter the truststore type.
Click Execute.
Click Download file under
Responses.
The downloaded file is your
certificate.
Select the /certs/truststorePassword GET operation and
click Try it out.
Click Execute.
The password is displayed under
Responses.
Run the following command to create the truststore:
keytool \
-importcert \
-storetype JKS \
-noprompt \
-keystore cdppvc-truststore.jks \
-storepass ***PASSWORD*** \
-alias cdppvc-cm-ca \
-file ***PATH TO CM CA CERTIFICATE***
Note down the password, it is needed in a later
step.
Copy the cdpdc-truststore.jks file to a common location on all
the hosts in your CDP Data Hub cluster.
Cloudera recommends
that you use the following location:
/opt/cloudera/security/cdppvc-truststore.jks.
Set the correct file permissions.
Use 751 for the directory and 444 for the
truststore file.
Note down the truststore location and password of the Cloudera Private Cloud Base cluster. These should be known to
you.
Copy the truststore file to a common location on all the hosts in your CDP Data Hub cluster.
Cloudera recommends that you use the
following location:
/opt/cloudera/security/truststore.jks.
Set the correct file permissions.
Use 751 for the directory and 444 for the
truststore file.
Access the Cloudera Manager instance of your CDP Data Hub cluster.
Define the external Kafka cluster (the Cloudera Private Cloud Base
cluster).
Go to Administration > External Accounts.
Go to the Kafka Credentials tab.
On this tab you will create a credential for each external cluster taking part in
the replication process.
Click Add Kafka credentials
Configure the Kafka credentials.
In the case of this example, you must create a single credential representing the
Cloudera Private Cloud Base cluster. For
example:
If credential creation is successful, a new entry corresponding to the Kafka
credential you specified appears on the page.
Define the co-located Kafka cluster (the CDP Data Hub
cluster).
In Cloudera Manager, go to Clusters and
select the Streams Replication Manager service.
Go to Configuration.
Find and enable the Kafka Service property.
Find and configure the Streams Replication Manager Co-located Kafka
Cluster Alias property.
The alias you configure represents the co-located cluster. Enter an alias that is
unique and easily identifiable. For example:
datahub
Enable relevant security feature toggles.
Because the CDP Data Hub cluster is both TLS/SSL and
Kerberos enabled, you must enable all feature toggles for both the Streams Replication Manager Driver and Service roles. The feature toggles are
the following.
Enable TLS/SSL for SRM Driver
Enable TLS/SSL for SRM Service
Enable Kerberos Authentication
Add both clusters to the configuration of Streams Replication Manager.
Find and configure the External Kafka Accounts
property.
Add the name of all Kafka credentials you created to this property. This can be
done by clicking the add button to add a new line to the property and then entering
the name of the Kafka credential. For example:
cdppvc
Find and configure the Streams Replication Manager Cluster
alias property.
Add all cluster aliases to this property. This includes the aliases present in
both the External Kafka Accounts and Streams
Replication Manager Co-located Kafka Cluster Alias properties. Delimit
the aliases with commas. For example:
datahub,cdppvc
Configure replications.
In this example data is replicated unidirectionally. As a result, only a single
replication must be configured.
Find the Streams Replication
Manager's Replication Configs property.
Click the add button and add new lines
for each unique replication you want to add and enable.
Add and enable your replications. For example:
cdppvc->datahub.enabled=true
Configure Streams Replication Manager Driver and Service role targets.
Find and configure the Streams Replication Manager Service Target
Cluster property.
Add the co-located cluster's alias to the property. For
example:
datahub
Find and configure the Streams Replication Manager Driver Target
Cluster property.
SRM Client's Kerberos Principal Name: [***MY
KERBEROS PRINCIPAL****]
SRM Client's Kerberos Keytab Location: [***PATH
TO KEYTAB FILE***]
Take note of the password you configure in SRM Client's Secure Storage
Password and the name you configure in Environment Variable
Holding SRM Client's Secure Storage Password. You will need to provide
both of these in your CLI session before running the tool.
Click Save Changes.
Restart the Streams Replication Manager service.
Deploy client configuration for Streams Replication Manager.
Start the replication process using the srm-control tool.
SSH as an administrator to any of the Streams Replication Manager hosts in
the CDP Data Hub cluster.
Replace [***SECURE STORAGE ENV VAR***] with the name of the
environment variable you specified in Environment Variable Holding SRM
Client's Secure Storage Password. Replace [***SECURE STORAGE
PASSWORD***] with the password you specified in SRM Client's
Secure Storage Password. For example:
export SECURESTOREPASS=”mypassword"
Use the srm-control tool with the topics
subcommand to add topics to the allow list.
Use the srm-control tool with the groups
subcommand to add groups to the allow list.
srm-control groups --source cdppvc --target datahub --add ".*"
Monitor the replication process.
Access the Streams Messaging Manager UI in the CDP Data Hub cluster, and go to the Cluster
Replications page. The replications you set up will be visible on this
page.