Migrating Hive Workloads to Cloudera Private CloudPDF version

Configuring the HDP cluster

You need to configure the HDP cluster before you dump workload data that you want to replicate on CDP.

Prepare a cron script to set policies for chained REPL DUMP commands and to control execution, for example to run at a certain time.
  1. In new and existing databases, include the repl.source.for property in the source database dbproperties file.
    Set the repl.source.for property value using the following format:
    'repl.source.for' = [****policy1 name***, ****policy2 name***, ****policy3 name***]                                 

    For example, to create a new source database for policies named 1, 2, and 3, configure the source database properties file as follows:

    ‘repl.source.for' = '1, 2, 3'             

    For example, to configure an existing source database named testdb, run the following command:

    ALTER DATABASE testdb SET
    DBPROPERTIES('repl.source.for'=[****policy1 name, policy2 name,
    policy3 name***]');                  
  2. On the HDP cluster, configure the mandatory HDP cluster configuration properties listed in the next topic.
  3. Run the REPL DUMP command along the mandatory policy-level configuration parameters using a cron script.
    Use the following command syntax:
    [***cron syntax for regular intervals***] beeline -u jdbc:hive2://[***source database***] hive
    -e"repl dump [***source database***] with [***mandatory policy-level configuration
    parameters separated by comma***]
    See the Cron Expression Generator & Explainer website.