Policy guidelines and considerations
- Hive ACID tables, StorageHandler based tables (such as Hbase), and column statistics are currently not replicated.
- When creating a schedule for a Hive replication policy, you should set the frequency so that changes are replicated often enough to avoid overly large copies.
- On the Create Policy page, the only requirement for clusters to display in the Source Cluster or Destination Cluster fields is that they are DLM-enabled. You must ensure that the clusters you select are healthy before you start a policy instance (job).
- Any user with access to the DLM UI has the ability to browse, within the DLM UI, the
folder structure of any clusters enabled for DLM.
Therefore, the DPS Admins and the Infra Admins can see folders, files, and databases in the DLM UI that they might not have access to in HDFS. The DataPlane Admin and Infra Admin cannot view from the DLM UI the content of files on the source or destination clusters. Nor do these admins have the ability to modify or delete folders or files that are viewable from the DLM UI.
- The first time you execute a job (an instance of a policy) with data that has not been
previously replicated, Data Lifecycle Manager replicates all data and associated
metadata from the source cluster to the destination.
As a result, the initial execution of a job can take a significant amount of time. After initial bootstrap, data replication is done incrementally, so only updated data is transferred.
- Policies cannot be modified after they are created.
- Achieving a one-hour Recovery Point Objective (RPO) depends on how you set up your
replication jobs and the configuration of your environment:
- Select data in sizes that replicate within 30 minutes.
- Set replication frequency to 45 minutes or fewer.
- Ensure that network bandwidth is sufficient, so that data can move fast enough to meet your RPO.
- Consider the rate of change of data being replicated.
- Specify bandwidth per map, in MBps. Each map is restricted to consume only the specified bandwidth. This is not always exact. The map throttles back its bandwidth consumption during a copy in such a way that the net bandwidth used tends towards the specified value.
- Ensure that the frequency is set so that a job finishes before the next job starts. Jobs based on the same policy cannot overlap. If a job is not completed before another job starts, the second job does not execute and is given the status Skipped. If a job is consistently skipped, you might need to modify the frequency of the job.