Replicating data from on premises to cloud with Streams Replication Manager on
premises
You can set up and configure an instance of Streams Replication Manager
running in a Cloudera Private Cloud Base cluster to replicate data between the
Cloudera Private Cloud Base cluster and a CDP Data Hub
cluster. In addition, you can use Streams Messaging Manager to monitor the replication
process.
Consider the following replication scenario.
In this scenario, data is replicated from a Cloudera Private Cloud Base
cluster that has Kafka, Streams Replication Manager, and Streams Messaging Manager deployed on it. This is a secure cluster that has TLS/SSL
encryption and Kerberos authentication enabled. In addition, it uses Ranger for
authorization.
Data is being replicated from this cluster by Streams Replication Manager deployed
in this cluster to a CDP Data Hub cluster.
The CDP Data Hub cluster is provisioned with one of the default
Streams Messaging cluster definitions.
This example scenario does not go into detail on how to
set up the clusters and assumes the following.
A CDP Data Hub
provisioned with the Streams Messaging Light Duty or Heavy Duty cluster definition is
available.
A Cloudera Private Cloud Base Base cluster with Kafka, Streams Replication Manager, and Streams Messaging Manager is available.
This cluster is TLS/SSL and Kerberos enabled. In addition, it uses Ranger for
authorization.
Network connectivity and DNS resolution are
established between the clusters.
This example scenario demonstrates the configuration required to enable replication
monitoring of the CDP Data Hub cluster with Streams Messaging Manager. This can be done by configuring the Streams Replication Manager Service role to target (monitor) the CDP Data Hub cluster. This is done as the last step in the following
list of steps and is marked optional. This is because enabling replication monitoring of
the CDP Data Hub cluster results in a number of caveats. The caveats
are the following.
The Streams Replication Manager Service role will generate additional cloud
traffic.
Any extra traffic you might have in your cloud deployment can lead to
additional cloud costs.
The replications tab in Streams Messaging Manager will display all
replications targeting the CDP Data Hub cluster.
Although
this is expected, you must understand that all other pages in Streams Messaging Manager will display information regarding the Cloudera Private Cloud Base cluster. A setup like this might lead to
confusion or mislead users on what this specific instance of Streams Messaging Manager is monitoring.
You will lose the ability to monitor the replications targeting the Cloudera Private Cloud Base cluster.
This is only critical if you
have any existing replications that are targeting the Cloudera Private Cloud Base cluster and you are monitoring these
replications with the Streams Messaging Manager instance running in the
Cloudera Private Cloud Base cluster.
Create a machine user for Streams Replication Manager in Cloudera Management Console.
A machine user is required so that Streams Replication Manager has credentials
that it can use to connect to the Kafka service in the CDP Data Hub
cluster.
Navigate to Management Console > User Management.
Click Actions > Create Machine User.
Enter a unique name for the user and click Create.
For example: srm
After the user is created, you are presented with a page that displays the
user details.
Click Set Workload Password.
Type a password in the Password and Confirm
Password fields. Leave the Environment field
blank.
Click Set Workload Password.
A message appears on successful password creation.
Grant the machine user access to your environment.
You must grant the machine user access to your environment for Streams Replication Manager to connect to the Kafka service with this user.
Navigate to
Management Console
> Environments, and select the environment where your Kafka cluster is
located.
Click Actions > Manage Access.
Use the search box to find and select the machine user you want to use.
A list of Resource Roles appears.
Select the EnvironmentUser role and click
Update Roles.
Go back to the Environment Details page and click Actions > Synchronize Users to FreeIPA.
On the Synchronize Users page, click Synchronize
Users.
Synchronizing users ensures that the role assignment is in effect for the
environment.
Add Ranger permissions for the user you created for Streams Replication Manager
in the CDP Data Hub cluster.
You must to grant the necessary privileges to the user so that the user can access
Kafka resources. This is configured through Ranger policies.
Navigate to
Management Console
> Environments, and select the environment where your Kafka cluster is
located.
Click the Ranger link on the Environment Details page.
Select the resource-based service corresponding to the Kafka resource in the CDP Data Hub cluster.
Add the Workload User Name of the user you created for Streams Replication Manager to the following Ranger policies.
All - consumergroup
All - topic
All - transactionalid
All - cluster
All - delegationtoken
Ensure that Ranger permissions exist for the streamsrepmgr user in the
Cloudera Private Cloud Base cluster.
Access the Cloudera Manager
instance of your Cloudera Private Cloud Base cluster.
Go to Ranger > Ranger Admin Web UI.
Log in to the Ranger Console (Ranger Admin Web UI).
Ensure that the streamsrepmgr user is added to all required
policies.
If the user is missing, add it. The required policies are as follows.
All - consumergroup
All - topic
All - transactionalid
All - cluster
All - delegationtoken
Create a truststore on the Cloudera Private Cloud Base cluster.
A truststore is required so that the Streams Replication Manager instance
running in the Cloudera Private Cloud Base cluster can trust the secure
CDP Data Hub cluster. To do this, you extract the FreeIPA
certificate from the Cloudera Data Platform environment, create a
truststore that includes the certificate, and copy the truststore to all hosts on the Cloudera Private Cloud Base cluster.
Navigate to
Management Console
> Environments, and select the environment where your Kafka cluster is
located.
Go to the FreeIPA tab.
Click Get FreeIPA Certificate.
The FreeIPA certificate file, [***ENVIRONMENT
NAME***].crt, is downloaded to your computer.
Run the following command to create the truststore.
If credential creation is successful, a new entry corresponding to the Kafka
credential you specified appears on the page.
Define the co-located Kafka cluster (the Cloudera Private Cloud Base
cluster).
In Cloudera Manager, go to Clusters and
select the Streams Replication Manager service.
Go to Configuration.
Find and enable the Kafka Service property.
Find and configure the Streams Replication Manager Co-located Kafka
Cluster Alias property.
The alias you configure represents the co-located cluster. Enter an alias that is
unique and easily identifiable. For example:
cdppvc
Enable relevant security feature toggles.
Because the Cloudera Private Cloud Base cluster is both TLS/SSL
and Kerberos enabled, you must enable all feature toggles for both the Streams Replication Manager Driver and Service roles. The feature toggles are
the following.
Enable TLS/SSL for SRM Driver
Enable TLS/SSL for SRM Service
Enable Kerberos Authentication
Add both clusters to the configuration of Streams Replication Manager.
Find and configure the External Kafka Accounts
property.
Add the name of all Kafka credentials you created to this property. This can be
done by clicking the add button to add a new line to the property and then entering
the name of the Kafka credential. For example:
datahub
Find and configure the Streams Replication Manager Cluster
alias property.
Add all cluster aliases to this property. This includes the aliases present in
both the External Kafka Accounts and Streams
Replication Manager Co-located Kafka Cluster Alias properties. Delimit
the aliases with commas. For example:
datahub,cdppvc
Configure replications.
In this example data is replicated unidirectionally. As a result, only a single
replication must be configured.
Find the Streams Replication
Manager's Replication Configs property.
Click the add button and add new lines
for each unique replication you want to add and enable.
Add and enable your replications. For example:
cdppvc->datahub.enabled=true
Configure Streams Replication Manager Driver and Service role targets.
Find and configure the Streams Replication Manager Service Target
Cluster property.
Add the co-located cluster's alias to the property. For
example:
cdppvc
Setting this property to
cdppvc does not enable you to monitor the replications targeting
the CDP Data Hub cluster. It is possible to add the CDP Data Hub cluster alias to this property and as a result
monitor the CDP Data Hub cluster. However, this can lead to
unwanted behaviour. See the Before you begin section for more
information.
Find and configure the Streams Replication Manager Driver Target
Cluster property.
Gateway TLS/SSL Trust Store File: [***CLOUDERA
PRIVATE CLOUD BASE GLOBAL TRUSTSTORE LOCATION***]
Gateway TLS/SSL Truststore Password:
[***CLOUDERA PRIVATE CLOUD BASE GLOBAL TRUSTSTORE
PASSWORD***]
SRM Client's Kerberos Principal Name: [***MY
KERBEROS PRINCIPAL****]
SRM Client's Kerberos Keytab Location: [***PATH
TO KEYTAB FILE***]
Take note of the password you configure in SRM Client's Secure Storage
Password and the name you configure in Environment Variable
Holding SRM Client's Secure Storage Password. You will need to provide
both of these in your CLI session before running the tool.
Click Save Changes.
Restart the Streams Replication Manager service.
Deploy client configuration for Streams Replication Manager.
Start the replication process using the srm-control tool.
SSH as an administrator to any of the Streams Replication Manager hosts in
the Cloudera Private Cloud Base cluster.
Replace [***SECURE STORAGE ENV VAR***] with the name of the
environment variable you specified in Environment Variable Holding SRM
Client's Secure Storage Password. Replace [***SECURE STORAGE
PASSWORD***] with the password you specified in SRM Client's
Secure Storage Password. For example:
export SECURESTOREPASS=”mypassword"
Use the srm-control tool with the topics
subcommand to add topics to the allow list.