Using Streams Replication Manager in CDP Public Cloud overview

Learn about the deployment options, prerequisites, and use cases for Streams Replication Managerin a cloud-based context.

Starting with the December 2020 release of CDP Public Cloud, Streams Replication Manager is included in the default Streams Messaging cluster definitions. As a result, you can deploy Streams Replication Manager in a CDP Data Hub cluster and use it to replicate Kafka data between all types of Cloudera Data Platform clusters. This includes replicating Kafka data to or from clusters deployed on either CDP Public CloudorCDP Private Cloud.

The following sections provide information on how you can deploy Streams Replication Manager in a CDP Data Hub cluster, what prerequisites you must meet before using Streams Replication Manager, and the common use cases where you can use Streams Replication Manager in a cloud-based context.

Differences between light and heavy deployments

In CDP Public Cloud, you can deploy Streams Replication Manager in CDP Data Hub clusters with both the light and heavy duty variants of the Streams Messaging cluster definition. However, there are significant differences in how Streams Replication Manager is deployed with each definition.
Light duty definition:
In the light duty definition, Streams Replication Manager is deployed by default on the broker and master hosts of the cluster. This means that Streams Replication Manager is available for use by default in a CDP Data Hub cluster provisioned with the light duty definition.
Heavy duty definition
In the heavy duty definition, Streams Replication Manager has its own host group. However, by default, the Streams Replication Manager host group is not provisioned. When creating a cluster with the heavy duty definition, you must set the instance count of the Srm nodes host group to at least one. Otherwise, Streams Replication Manager is not deployed on the cluster.
For more information on cluster provisioning, see Creating your first Streams Messaging cluster. For more information on the default cluster definitions and cluster layouts, see Streams Messaging cluster layout.

Prerequisites for using Streams Replication Manager

Streams Replication Manager can be used to replicate Kafka data between all types of Cloudera Data Platform clusters. However, the following conditions must be met for all deployments and use cases:.
  • Streams Replication Manager must be able to access the Kafka hosts of the source and target cluster through the network.

  • Streams Replication Manager must trust the TLS certificates of the brokers in the source and target clusters.

    This is required so that Streams Replication Manager can establish a trusted connection.

  • Streams Replication Manager must have access to credentials that it can use to authenticate itself in both the source and target clusters.

  • Streams Replication Manager must use a principal that is authorized to access Kafka resources (topics) on both source and target clusters.

Cloud-based use cases for Streams Replication Manager

There are three common uses cases when using Streams Replication Manager in a cloud-based context. These are as follows.

Replicating data from on premises to cloud with Streams Replication Manager on premises
In this use case you replicate data from a Cloudera Private Cloud Base cluster to a CDP Data Hub cluster with Streams Replication Manager running in the Cloudera Private Cloud Base cluster.
Replicating data from on premises to cloud with Streams Replication Manager in the cloud
In this use case you replicate data from a Cloudera Private Cloud Base cluster to a CDP Data Hub cluster with Streams Replication Manager running in the CDP Data Hub cluster.
Replicating data between cloud clusters with Streams Replication Manager in the cloud
In this use case you replicate data between CDP Data Hub clusters with Streams Replication Manager running in a CDP Data Hub cluster.