Understanding co-located and external clusters
The Kafka clusters that Streams Replication Manager (SRM) connects to can be categorized into two groups. They can be co-located with or external to SRM. The category dictates how you configure the SRM service and the srm-control tool.
SRM connects to and replicates data between Kafka clusters, which consist of one or more Kafka brokers, deployed on clusters. These Kafka clusters that SRM connects to can be categorized into two groups. They can either be co-located with or external to SRM. Which category a specific cluster falls into is decided based on the relation between that cluster and the SRM service.
A co-located Kafka cluster is the Kafka cluster that is running in the same cluster as the SRM service. Any other Kafka cluster that is remote to SRM either logically or geographically is considered external.
For example, consider the following deployment:
This deployment has two clusters, and both clusters have a Kafka cluster. However, only Cluster East has SRM deployed on it. From the perspective of SRM East, Kafka Cluster East is co-located, while Kafka Cluster West is external.
In a more advanced deployment with multiple SRM services, a single Kafka cluster will fall into both categories. From the perspective of a specific SRM service a cluster will be co-located, while for others it will be external.
For example, consider the following deployment:
In this example, both clusters have a Kafka cluster as well as SRM. From the perspective of SRM East, Kafka Cluster East is co-located, Kafka Cluster West is external. From the perspective of SRM West, Kafka Cluster West is co-located, Kafka Cluster East is external.
It is also possible to not have co-located Kafka clusters. For example, consider the following deployment:
In this example, the clusters that have Kafka deployed on them do not have SRM. Instead, data is replicated by an SRM instance deployed on a separate cluster. From the perspective of SRM South, both Kafka Cluster East and West are external, there is no co-located cluster. In a scenario like this, configuration tasks related to the co-located cluster do not need to be completed.
Being able to correctly identify what category a Kafka cluster falls into is important as the
category dictates how you configure each SRM service and the srm-control
tool. In general, a co-located Kafka cluster requires less configuration than external Kafka
clusters. This is because Cloudera Manager is able to automatically pass certain configuration
properties about the co-located Kafka cluster to SRM. External Kafka clusters on the other
hand must be fully configured and specified manually.
For more information on how to configure and set up SRM, review any of the configuration examples available in Using Streams Replication Manager in CDP Public Cloud overview or Configuration examples.