Replication Manager terminology
Replication Manager is a service that can be accessed through the Cloudera Public Cloud web interface in Cloudera Data Platform. You can create replication policies in Replication Manager. You can also use CDP CLI commands to create replication policies.
Term | Description |
---|---|
Replication Manager Service | The web UI that runs on the Cloudera Data Platform host. |
Data center | The facility that contains the computer, server, and storage systems and
associated infrastructure, such as routers and switches. Corporate data is
stored, managed, and distributed from the data center. In an on-premises environment, a data center is often composed of a CDH cluster or Cloudera Private Cloud Base cluster. However, a single data center can contain multiple on-premises clusters. |
Cloud data lake or data lake | A Cloudera Public Cloud cluster on the cloud, using virtual machines, with data retained on cloud storage. A cloud data lake requires minimal services for metadata and governance, such as Hive metastore, Ranger, and Atlas. |
Cloud storage | A storage retained in a cloud account, such as Amazon S3 web service or Microsoft Azure. |
On-premises cluster | A CDH cluster in a data center or a Cloudera Private Cloud Base cluster, with Apache services running, such as HDFS, Yarn, HMS, Hiveserver2, Ranger, and Atlas. Replication behavior is similar to IaaS cluster replication. The data is on local HDFS. |
Replication policy | A set of rules applied to a replication relationship. The rules include which clusters serve as source and destination, the type of data to replicate, the schedule for replicating data, and so on. |
Job | An instance of a replication policy that is running or is completed |
Source cluster | The cluster that contains the source data that is replicated to a destination cluster. Source data could be an HDFS dataset, Hive database, or HBase tables. |
Target cluster | The cluster to which the data is replicated. |