DLM Administration
Also available as:
PDF

Create a Replication Policy

You must create a policy to assign the rules for the replication job (instance of a policy) that you want to execute. You can set rules such as the type of data to replicate, the time and frequency of replication, the bandwidth allowed for a job, and so forth. During replication, data and associated file metadata or table structures or schemas are also replicated

Prerequisites

  • The clusters you want to include in the replication policy must have been paired already.

  • You must ensure that the clusters you select are healthy before you start a policy instance (job).

  • On destination clusters, the DLM Engine must have been granted write permissions on folders being replicated.

  • The target folder or database on the destination cluster must either be empty or not exist prior to starting a new policy instance.

About This Task

  • You must use the DLM Infrastructure Admin role to perform this task.

  • You cannot modify a policy after it is created.

    To change a policy, you must create a new policy with the new settings.

  • DLM does not support update of any cluster endpoints (HDFS, Hive, Ranger, or DLM Engine). If an endpoint must be modified, contact Hortonworks support for assistance.

  • The first time you execute a job with data that has not been previously replicated, Data Lifecycle Manager creates a new folder or database and bootstraps the data.

    [Important]Important

    During a bootstrap operation, all data is replicated from the source cluster to the destination. As a result, the initial execution of a job can take a significant amount of time, depending on how much data is being replicated, network bandwidth, and so forth.

    After initial bootstrap, data replication is performed incrementally, so only updated data is transferred. Data is in a consistent state only after incremental replication has captured any new changes that occurred during bootstrap.

Steps

  1. In the DLM navigation pane, click Policies.

    The Replication Policies page displays a list of any existing policies.

  2. Click Add>Policy.

  3. Enter or select the following information:

    FieldDescriptionAdditional Information
    Policy NameThe policy name that will display in the UIMaximum length of 64 characters. Spaces, dashes, and underscores are the only special characters allowed.
    DescriptionAny useful information to identify the policy or its use
    ServiceHive or HDFS replicationFor Hive replication, a corresponding Hive database structure must exist on the destination. For HDFS, the corresponding file system structure is created when the first replication job executes.
    Source ClusterThe cluster that contains the data to be replicatedIf the cluster you want is not listed, you need to enable the cluster for DLM.
    Destination ClusterThe cluster to which the source data will be replicatedIf the cluster you want is not listed, you need to enable the cluster for DLM.
    Select a Folder Path (Only if HDFS is selected)The HDFS directories available to browse and to select for replicationThe Infra Admin role has read privileges, in the DLM UI only, for all HDFS directories on the source and destination clusters. Clusters must be paired before you can browse HDFS directories in DLM.
    Select Database (Only if Hive is selected)The internal or external databases available to browse and to select for replicatedThe Infra Admin role has read privileges, in the DLM UI only, for all databases on the source and destination clusters.
  4. Select how you want the job to run:

    When setting the schedule, consider requirements such as RPO and RTO, network bandwidth, and so forth.

    FieldDescriptionAdditional Information
    RepeatHow often you want the job to runChoices are weeks, days, hours, or minutes. For a Hive replication policy, set the frequency so that changes are replicated often enough to avoid overly large copies.
    Start and End DatesThe dates you want the job to start (required) and end (optional)If you do not set an end date, the job runs at the set time and frequency until the job is manually cancelled.
    Start Time24-hour clock 
  5. Enter or select the Advanced Properties:

    FieldDescriptionAdditional Information
    Queue Name (Optional)The YARN queue you want to use to prioritize job scheduling across multiple tenantsIf no queue is entered, DLM defaults to the YARN queue identified in the Ambari View for YARN Capacity Scheduler. You can enter one queue name per policy.
    Maximum Bandwidth (Optional)The maximum bandwidth to be used when running a job based on this policyEnables you to restrict the amount of data throughput to the specified value. Enter a number in megabytes per second (MBps).
  6. Click Review and verify that the settings are correct.

    [Important]Important

    After a policy is created, it cannot be modified.

  7. Click Submit.

    A message appears, stating that the submission was successful.

Next Steps

Verify that the replication job is running as intended.

More Information

Policy Guidelines and Considerations

How Policies Work in Data Lifecycle Manager