Using Streams Replication Manager in CDP Public Cloud overview
Learn about the deployment options, prerequisites, and use cases for Streams Replication Managerin a cloud-based context.
Starting with the December 2020 release of CDP Public Cloud, Streams Replication Manager is included in the default Streams Messaging cluster definitions. As a result, you can deploy Streams Replication Manager in a CDP Data Hub cluster and use it to replicate Kafka data between all types of Cloudera Data Platform clusters. This includes replicating Kafka data to or from clusters deployed on either CDP Public CloudorCDP Private Cloud.
The following sections provide information on how you can deploy Streams Replication Manager in a CDP Data Hub cluster, what prerequisites you must meet before using Streams Replication Manager, and the common use cases where you can use Streams Replication Manager in a cloud-based context.
Differences between light and heavy deployments
- Light duty definition:
- In the light duty definition, Streams Replication Manager is deployed by default on the broker and master hosts of the cluster. This means that Streams Replication Manager is available for use by default in a CDP Data Hub cluster provisioned with the light duty definition.
- Heavy duty definition
- In the heavy duty definition, Streams Replication Manager has its own host group. However, by default, the Streams Replication Manager host group is not provisioned. When creating a cluster with the heavy duty definition, you must set the instance count of the Srm nodes host group to at least one. Otherwise, Streams Replication Manager is not deployed on the cluster.
Prerequisites for using Streams Replication Manager
-
Streams Replication Manager must be able to access the Kafka hosts of the source and target cluster through the network.
-
Streams Replication Manager must trust the TLS certificates of the brokers in the source and target clusters.
This is required so that Streams Replication Manager can establish a trusted connection.
-
Streams Replication Manager must have access to credentials that it can use to authenticate itself in both the source and target clusters.
-
Streams Replication Manager must use a principal that is authorized to access Kafka resources (topics) on both source and target clusters.
Cloud-based use cases for Streams Replication Manager
There are three common uses cases when using Streams Replication Manager in a cloud-based context. These are as follows.
- Replicating data from on premises to cloud with Streams Replication Manager on premises
- In this use case you replicate data from a Cloudera Private Cloud Base cluster to a CDP Data Hub cluster with Streams Replication Manager running in the Cloudera Private Cloud Base cluster.
- Replicating data from on premises to cloud with Streams Replication Manager in the cloud
- In this use case you replicate data from a Cloudera Private Cloud Base cluster to a CDP Data Hub cluster with Streams Replication Manager running in the CDP Data Hub cluster.
- Replicating data between cloud clusters with Streams Replication Manager in the cloud
- In this use case you replicate data between CDP Data Hub clusters with Streams Replication Manager running in a CDP Data Hub cluster.