Add Streams Replication Manager to an existing cluster

Streams Replication Manager can be installed on an existing cluster managed by Cloudera Manager. To do this, you need to add Streams Replication Manager to the cluster and configure a number of mandatory properties related to clusters, replications, and role targets.

Streams Replication Manager is comprised of two roles:
  • Streams Replication Manager Driver role: This role is responsible for connecting to the specified clusters and performing replication between them. The driver can be installed on one or more hosts.
  • Streams Replication Manager Service role: This role consist of a REST API and a Kafka Streams application to aggregate and expose cluster, topic and consumer group metrics. The service can be installed on one host only.
You can install Streams Replication Manager independent of the clusters that replication is happening between.

The following steps walk you through the process of adding Streams Replication Manager to your cluster. The configuration examples on this page are simple examples that are meant to demonstrate the type of information that you have to enter. For comprehensive configuration examples, Configuration examples in the Related information section below.

  • If you are planning on replicating data to or from a Kafka service running in either a CDH 5.x or 6.x cluster and you are using Sentry for authorization, make sure that the streamsrepmgr user is added to the Kafka Super users property. You can find the Super users property by going to Kafka service > Configuration. Do this on all CDH 5.x or 6.x clusters where data replication will happen.

  • If you are planning on replicating data to or from a Kafka service running in Runtime 7.x and you are using Ranger for authorization, make sure that the streamsrepmgr user has all required permissions assigned to it in Ranger. Do this on all Runtime 7.x clusters where data replication will happen.
  1. From the Cloudera Manager Home page, select the drop-down to the right of your cluster, and select Add a Service.
  2. Select Streams Replication Manager from the list of services and click Continue.
  3. Assign role instances to hosts:
    Select at least 2 hosts for both the driver and the service role if you want to enable high availability mode for SRM.
    1. Click the field below Streams Replication Manager Driver to display a dialog box containing a list of hosts.
    2. Select 1 or more hosts that the Streams Replication Manager Driver should be assigned to and Click Ok.
    3. Click the field below Streams Replication Manager Service to display a dialog box containing a list of hosts.
    4. Select 1 or more host that the Streams Replication Manager Service should be assigned to and Click Ok.
  4. Click Continue.
  5. Specify cluster aliases:
    1. Find the Streams Replication Manager Cluster alias property.
    2. Add a comma delimited list of cluster aliases. For example:
      primary, secondary
      Cluster aliases are arbitrary names defined by the user. Aliases specified here are used in other configuration properties and with the srm-control tool to refer to the clusters added for replication.
  6. Specify cluster connection information:
    1. Find the Streams Replication Manager's Replication Configs property.
    2. Click the add button and add new lines for each cluster alias you have specified in the Streams Replication Manager Cluster alias property
    3. Add connection information for your clusters. For example:
      primary.bootstrap.servers=primary_host1:9092,primary_host2:9092,primary_host3:9092
      secondary.bootstrap.servers=secondary_host1:9092,secondary_host2:9092,secondary_host3:9092

      Each cluster has to be added to a new line. If a cluster has multiple hosts, add them to the same line but delimit them with commas.

  7. Add and enable replications:
    1. Find the Streams Replication Manager's Replication Configs property.
    2. Click the add button and add new lines for each unique replication you want to add and enable.
    3. Add and enable your replications. For example:
      primary->secondary.enabled=true
      secondary->primary.enabled=true
      
  8. Specify the Streams Replication Manager Service role target cluster:
    1. Find the Streams Replication Manager Service Target Cluster property.
    2. Add the target cluster alias. For example:
      secondary

    The target cluster is where the service gathers replication information from. Cloudera recommends that you deploy the service on every cluster and configure each instance of the service to target the cluster that it is running on.

  9. Optional: Specify the Streams Replication Manager Driver role target clusters:
    1. Find the Streams Replication Manager Driver Target Cluster property.
    2. Add the cluster aliases that you want the driver role to target. For example:
      primary, secondary

    You can use the Streams Replication Manager Driver Target Cluster property to specify a subset of clusters that the driver should target or in other words write data to. When this property is left empty (default) the driver will read from and write to all clusters added to SRMs configuration. When this property is set, the driver will collect data from all clusters, but will only write to the clusters specified in this property. However, in order for monitoring to function correctly, this property has to contain the target as well as source clusters. As a result, custom configuration of this property is considered an advanced configuration practice, which is only viable in complex replication scenarios. Cloudera recommends that you either leave this property empty or add all clusters taking part in the replication.

  10. Configure properties not exposed in Cloudera Manager:
    SRM accepts a number of additional configuration properties that are not available in Cloudera Manager. Depending on your requirements you may need to configure these properties as well. You can find a comprehensive list of these properties in Configuration Properties Reference for Properties not Available in Cloudera Manager.
    1. Find the Streams Replication Manager's Replication Configs property.
    2. Click the add button and add new lines for each additional property you want to configure.
    3. Add configuration properties. For example:
      replication.factor=3
  11. Depending on your requirement, review and configure other properties available on this page.
  12. Click Continue and wait until Streams Replication Manager is started.
  13. Click Continue then click Finish.
  • Replicating data to or from the specified clusters is now possible.
  • The SRM service REST API Swagger UI is available at one of the following addresses:
    • http://<srm-service-host>:<srm-service-port/swagger
    • https://<srm-service-host>:<srm-service-port/swagger
  • Enable Kerberos and TLS/SSL for SRM.
  • Use the srm-control tool to kick off replication by adding topics or groups to the allowlist.

If you plan to use Streams Messaging Manager (SMM) to monitor Kafka cluster replications, configure SMM to communicate with Streams Replication Manager (SRM). For information, see Configuring SMM for Monitoring SRM Replications.