Understanding co-located and external clusters

The Kafka clusters that Streams Replication Manager connects to can be categorized into two groups. They can be co-located with or external to Streams Replication Manager. The category dictates how you configure the Streams Replication Manager service and the srm-control tool.

Streams Replication Manager connects to and replicates data between Kafka clusters, which consist of one or more Kafka brokers, deployed on clusters. These Kafka clusters that Streams Replication Manager connects to can be categorized into two groups. They can either be co-located with or external to Streams Replication Manager. Which category a specific cluster falls into is decided based on the relation between that cluster and the Streams Replication Manager service.

A co-located Kafka cluster is the Kafka cluster that is running in the same cluster as the Streams Replication Manager service. Any other Kafka cluster that is remote to Streams Replication Manager either logically or geographically is considered external.

For example, consider the following deployment:

This deployment has two clusters, and both clusters have a Kafka cluster. However, only Cluster East has Streams Replication Manager deployed on it. From the perspective of Streams Replication Manager East, Kafka Cluster East is co-located, while Kafka Cluster West is external.

In a more advanced deployment with multiple Streams Replication Manager services, a single Kafka cluster will fall into both categories. From the perspective of a specific Streams Replication Manager service a cluster will be co-located, while for others it will be external.

For example, consider the following deployment:

In this example, both clusters have a Kafka cluster as well as Streams Replication Manager. From the perspective of Streams Replication Manager East, Kafka Cluster East is co-located, Kafka Cluster West is external. From the perspective of Streams Replication Manager West, Kafka Cluster West is co-located, Kafka Cluster East is external.

It is also possible to not have co-located Kafka clusters. For example, consider the following deployment:

In this example, the clusters that have Kafka deployed on them do not have Streams Replication Manager. Instead, data is replicated by a Streams Replication Manager instance deployed on a separate cluster. From the perspective of Streams Replication Manager South, both Kafka Cluster East and West are external, there is no co-located cluster. In a scenario like this, configuration tasks related to the co-located cluster do not need to be completed.

Being able to correctly identify what category a Kafka cluster falls into is important as the category dictates how you configure each Streams Replication Manager service and the srm-control tool. In general, a co-located Kafka cluster requires less configuration than external Kafka clusters. This is because Cloudera Manageris able to automatically pass certain configuration properties about the co-located Kafka cluster to Streams Replication Manager. External Kafka clusters on the other hand must be fully configured and specified manually.

For more information on how to configure and set up Streams Replication Manager see Add Streams Replication Manager to an existing cluster. In addition you can also review any of the configuration examples available in Configuration examples or Streams Replication Manager security overview.