Creating Atlas replication policies
You can create Atlas replication policies to replicate the metadata and data lineage of all the Hive external tables, Iceberg tables, and any other Atlas supported entities between Cloudera Private Cloud Base 7.1.9 SP1 clusters using Cloudera Manager 7.11.3 CHF7 or higher.
-
Add the source cluster as a peer to the target cluster. The replication policy
requires a replication peer to locate the source data. You can use an existing
peer or add a new peer.
For information about adding a source cluster as a peer, see Adding cluster as a peer.
- Go to the Cloudera Manager > Replication > Replication Policies page in the target cluster where the peer is set up.
-
Click Create Replication Policy > Atlas Replication Policy.
The Create Atlas Replication Policy wizard appears.
-
Configure the following options on the General tab:
Option Description Policy Name Enter a unique name for the replication policy. Source Choose the source cluster that has the required peer, the required source data to replicate, and the source Atlas service. Destination Choose the target cluster. The drop-down list shows the clusters that are managed by the current Cloudera Manager.
Schedule Choose: - Immediate to run the replication policy immediately after policy creation is complete.
- Once to run the schedule one time in the future. Set the date and time.
- Recurring to run the schedule periodically in the future. Set the date, time, and interval between runs.
You must consider the following factors before you configure the replication frequency or recurring schedule:
- The anticipated rate of change and the frequency of the schedule can predict the RTO (Recovery Time Objective) and RPO (Recovery Point Objective) during a disaster recovery process. Therefore, choose a schedule that provides an optimal RTO and RPO.
- Recurring frequency impacts the compute load on the entire system. That is, frequent replication affects the overall compute capacity of the participating nodes in the replication process which in turn can impact other workloads running on these nodes.
Fetch type Choose one of the fetch type options to use during the Atlas export operation: - FULL fetches all the directly and indirectly connected entities along with the entity in scope.
- CONNECTED fetches on the directly connected entities, both parent and child entities, along with the entity in scope.
- INCREMENTAL fetches the optimized version of CONNECTED fetch type, that is only the delta is copied in subsequent runs.
Match type Choose one of the following match type to use during the Atlas export operation
- STARTS_WITH searches for the entity names that start with the specified criteria.
- ENDS_WITH searches for the entity names that end with the specified criteria.
- CONTAINS searches for an entity that has the specified criteria as a sub-string.
- MATCHES searches for an entity that matches a regular expression with the specified criteria.
Entity regex filter Enter a regex filter to filter the entities. Atlas entity types to include Add one or more Atlas entity types to use during the Atlas export operation. Skip lineage Choose to skip data lineage for the Atlas tables in the source cluster. Staging Directory Enter a relative path to the staging directory on the target cluster. - Click Create.
If you selected Immediate in the Schedule field, the replication job starts replicating after you click Save Policy.