Configuring the CDP cluster

You need to take advantage of Hive scheduled queries to load replicated workloads from HDP onto CDP using the REPL LOAD command. In the event of replication process problems, scheduled query metrics help you troubleshoot.

To perform Hive replication of external tables, add the hive user to the supergroup.
  1. Run a scheduled query on the CDP Private Cloud Base cluster to create a replication policy, using values for mandatory properties in the Mandatory CDP policy-level properties table in the next topic.
    create scheduled query repl_[***replication policy name***’
    [***FREQ***] as REPL LOAD [***SOURCE DB NAME***] into [***TARGET DB
    NAME***] with [***Configuration parameters in key value pairs
    separated by comma***] executed as [***user_name***];
    • Ensure that the replication policy name is in repl_[***policy name***] format.

      The scheduler is a generic scheduler in Hive and is used for various purposes including replication.

    • Make sure to filter the replication-related schedules.
  2. Change the replication policy using the Hive statements in Supported Scheduled Query Operations.