Replication Flows Overview
Get familiar with the concept of replications and replication flows.
Replication involves sending records from a source cluster to a target cluster. In SRM a
replication refers to a source and target cluster pair, the direction in which data is flowing
and the topics that are being replicated. Source target cluster pairs can be specified in
Cloudera Manager; they are notated source->target
. Initially, when
source->target
pairs are set up they are considered inactive, as no data is
being replicated between them. To start replication users need to specify which topics to
replicate with the srm-control
command line tool.
It is important to understand
that replication in SRM is configured independently for each source->target
cluster pair. Moreover, configuration is done on a per topic basis. This means that each topic
in a source cluster can have a different direction or target that it is being replicated to. A
set of topics in the source cluster can be replicated to multiple target clusters while others
are being replicated to only one target cluster. This allows users to set up powerful, topic
specific replication flows.
The term replication flow is used to specify all replications set up in a system. This document uses the term when referring to the visual representation of SRM replications.
A basic example of a replication flow is when topics are being sent from one cluster to another
cluster in a different geographical location. Note that in this example there is only one
replication or source->target
pair. Moreover, only one of the two topics on
the source cluster are being replicated to the target cluster.