Atlas metadata
Atlas plugin within DLM is used to replicate Atlas metadata. It uses incremental export to move data across clusters, thereby optimizing the payload for speed and size. Atlas replication is optional and can be turned ON during the DLM replication policy setup time.
Note | |
---|---|
The lineage associated with the entities is not replicated. |
On the source cluster, the entity’s replicatedTo
attribute is updated
to indicate the cluster it is being replicated to and on the target cluster the entity’s
replicatedFrom
attribute is modified to indicate its source. Since
each cluster has its own identity, the entities that are part of replication are transformed
such that, they appear to be native to the cluster they are going to reside within. This
involves changing attributes that are indicative of their place of residence. In addition,
within Atlas, new entities of type AtlasServer are created. This allows for a central place to
access all the servers for which replication has been initiated. Replication audit logs can
also be accessed here. Each audit entry has details of every export or import performed for
that cluster.
When a DLM Atlas replication job is executed, any Atlas metadata associated with the dataset on source Atlas server, which is replicated, is exported from source, and imported in the target Atlas cluster. The associated replication policy must not be updated or modified during the course of the replication life cycle. You can perform Atlas replication on-premise to on-premise using both HDFS and Hive. You must make sure that there are at least two clusters that are registered in your DLM app instance. And Atlas must be installed on source and target clusters. Optionally, using Ambari UI, you can verify if Atlas is installed on these clusters. While you create a new Atlas replication policy, do not select Disable Atlas metadata replication check-box.