Using Streams Replication Manager in CDP Public Cloud overview

Streams Replication Manager (SRM) can be deployed in both CDP PvC Base (on-prem) and Data Hub (cloud) clusters. You can use your SRM deployment to replicate Kafka data between CDP PvC Base and Data Hub clusters, or to replicate data between multiple Data Hub clusters. Review the following information to learn more about your deployment options, as well as the prerequisites and use cases for using SRM in a cloud based context.

Starting with the December 2020 release of CDP Public Cloud, SRM is included in the default Streams Messaging cluster definitions. As a result, you can deploy SRM in a Data Hub cluster and use it to replicate data between all types of CDP clusters. This includes deployments with either Data Hub, CDP PvC Base, or both.

The following sections collect information on how you can deploy SRM in a Data Hub cluster, give an overview of the prerequisites that must be met before using SRM, as well as give a list of common uses cases recommended by Cloudera.

Differences between light and heavy deployments

In CDP Public Cloud, SRM can be deployed in Data Hub clusters with both the light and heavy duty variants of the Streams Messaging cluster definition. However, there are significant differences in how SRM is deployed with each definition:
Light duty definition:
In the light duty definition, SRM is deployed by default on the broker and master hosts of the cluster. This means that SRM is available for use by default in a Data Hub cluster provisioned with the light duty definition.
Heavy duty definition
In the heavy duty definition, SRM has its own host group. However, by default, the SRM host group is not provisioned. When creating a cluster with the heavy duty definition, you must set the instance count of the Srm nodes host group to at least 1. Otherwise, SRM is not be deployed on the cluster.
For more information on cluster provisioning, see Creating your first Streams Messaging cluster. For more information on the default cluster definitions and cluster layouts, see Streams Messaging cluster layout

Prerequisites for using SRM

SRM can be used to replicate Kafka data between all types of CDP clusters. However, the following conditions must be met for all deployments and use cases:
  • SRM must be able to access the Kafka hosts of the source and target cluster through the network.

  • SRM must trust the TLS certificates of the brokers in the source and target clusters.

    This is required so that SRM can establish a trusted connection.

  • SRM must have access to credentials that it can use to authenticate itself in both the source and target clusters.

  • SRM must use a principal that is authorized to access Kafka resources (topics) on both source and target clusters.

Cloud based use cases for SRM

There are three common uses cases when using SRM in a cloud based context. These are as follows:

  • Replicating data from a CDP PvC Base cluster to a Data Hub cluster with SRM running in the CDP PvC Base cluster.

  • Replicating data from a CDP PvC Base cluster to a Data Hub cluster with SRM running in the Data Hub cluster.

  • Replicating data between two Data Hub clusters.