Example commands for replicating HDP 3 workloads

To perform Hive replication from an HDP 3.1.5.6000 cluster to a CDP Private Cloud Base 7.1.6 or later cluster, you need to know how to run the REPL DUMP on the HDP cluster and REPL LOAD on the CDP Private Cloud Base cluster. Examples of valid commands helps you create counterparts to replicate your workloads to CDP. You learn how to replicate data between HA clusters.

Example Hive commands for replicating managed tables

To replicate managed tables, use the REPL commands as shown in these examples.

  1. On HDP, dump a workload of managed tables.
    repl dump src with (
    'hive.repl.dump.version'= '2',
    'hive.repl.rootdir'= 'hdfs://<host>:<port>/user/hive/replDir/d1'
    );
  2. On CDP, load the workload of managed tables.
    repl load src into tgt with (
    'hive.repl.rootdir'= 'hdfs://<host>:<port>/user/hive/replDir/d1'
    );

Example Hive commands for replicating external tables

Prerequisite: To perform Hive replication of external tables, add the hive user to the supergroup.

  1. On HDP, dump a workload of external tables.
    repl dump src with (
    'hive.repl.dump.version'= '2',
    'hive.repl.rootdir'= 'hdfs://<host>:<port>/user/hive/replDir/d1',
    'hive.repl.include.external.tables'= 'true',
    'hive.repl.dump.metadata.only.for.external.table'= 'false',
    'hive.repl.replica.external.table.base.dir'=
    'hdfs://<replica-host>:<port>/user/hive/externalDir/d1'
    );
  2. On CDP, load the external tables.
    repl load src into tgt with (
    'hive.repl.dump.version'= '2',
    'hive.repl.rootdir'= 'hdfs://<host>:<port>/user/hive/replDir/d1',
    'hive.repl.include.external.tables'= 'true',
    'hive.repl.dump.metadata.only.for.external.table'= 'false',
    'hive.repl.replica.external.table.base.dir'=
    'hdfs://<replica-host>:<port>/user/hive/externalDir/d1'
    );    

Example Hive commands for replicating data between HA clusters

To perform Hive replication between HA clusters (HDP and CDP Private Cloud Base clusters), you must provide HDFS-related HA configuration properties for HDP and CDP clusters in the REPL DUMP and REPL LOAD commands. The following examples show Hive replication from an HDP cluster to a CDP Private Cloud Base cluster that are HA-enabled.

  1. On HDP, dump a workload.
    repl dump src with (
          'hive.repl.dump.version'= '2',
    'hive.repl.rootdir'= 'hdfs://ns1/user/hive/replDir/d1',
    'hive.repl.dump.metadata.only.for.external.table'= 'false',
    'hive.repl.replica.external.table.base.dir'=
          'hdfs://ns1/user/hive/externalDir/d1',
    'hive.repl.include.external.tables'= 'true',
          'dfs.nameservices'= 'mycluster,ns1',
    'mapreduce.job.hdfs-servers.token-renewal.exclude'= 'ns1',
    'dfs.ha.automatic-failover.enabled'= 'true',
    'dfs.ha.namenodes.mycluster'= 'nn1,nn2',
    'dfs.namenode.rpc-address.mycluster.nn1'=
          'ctr-1617214704777-622-01-007.h.site:8020',
    'dfs.namenode.rpc-address.mycluster.nn2'=
          'ctr-1617214704777-622-01-004.h.site:8020',
    'dfs.namenode.http-address.mycluster.nn1'=
          'ctr-1617214704777-622-007.h.site:20070',
    'dfs.namenode.http-address.mycluster.nn2'=
          'ctr-1617214704777-622-01-004.h.site:20070',
    'dfs.namenode.https-address.mycluster.nn1'=
          'ctr-1617214704777-622-01-007.h.site:20470',
    'dfs.namenode.https-address.mycluster.nn2'=
          'ctr-1617214704777-622-01-004.h.site:20470',
    'dfs.client.failover.proxy.provider.mycluster'=
          'org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider',
    'dfs.ha.automatic-failover.enabled.ns1'= 'true',
    'dfs.ha.namenodes.ns1'= 'namenode1546333583,namenode1546336357',
    'dfs.namenode.rpc-address.ns1.namenode1546333583'='sar-uk-1.sar-uk.root.h.site:8020',
    'dfs.namenode.rpc-address.ns1.namenode1546336357'='sar-uk-2.sar-uk.root.h.site:8020',
    'dfs.namenode.http-address.ns1.namenode1546333583'='sar-uk-1.sar-uk.root.h.site:20101',
    'dfs.namenode.http-address.ns1.namenode1546336357'='sar-uk-2.sar-uk.root.h.site:20101',
    'dfs.namenode.https-address.ns1.namenode1546336357'='sar-uk-1.sar-uk.root.h.site:20102',
    'dfs.namenode.https-address.ns1.namenode1546333583'='sar-uk-1.sar-uk.root.h.site:20102',
    'dfs.client.failover.proxy.provider.ns1'=
          'org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
     );
  2. On CDP, load the workload.
    repl load src into tgt with (
    'hive.repl.rootdir'= 'hdfs://ns1/user/hive/replDir/d1',
    'hive.repl.dump.metadata.only.for.external.table'= 'false',
    'hive.repl.replica.external.table.base.dir'=
          'hdfs://ns1/user/hive/externalDir/d1',
    'hive.repl.include.external.tables'= 'true',
          'dfs.nameservices'= 'mycluster,ns1',
    'mapreduce.job.hdfs-servers.token-renewal.exclude'= 'mycluster',
    'dfs.ha.automatic-failover.enabled'= 'true',
    'dfs.ha.namenodes.mycluster'= 'nn1,nn2',
    'dfs.namenode.rpc-address.mycluster.nn1'=
          'ctr-1617214704777-622-01-007.h.site:8020',
    'dfs.namenode.rpc-address.mycluster.nn2'=
          ‘ctr-1617214704777-622-01-004.h.site:8020',
    'dfs.namenode.http-address.mycluster.nn1'=
          'ctr-1617214704777-622-01-007.h.site:20070',
    'dfs.namenode.http-address.mycluster.nn2'=
          'ctr-1617214704777-622-01-004.h.site:20070',
    'dfs.namenode.https-address.mycluster.nn1'=
          'ctr-1617214704777-622-01-007.h.site:20470',
    'dfs.namenode.https-address.mycluster.nn2'=
          'ctr-1617214704777-622-01-004.h.site:20470',
    'dfs.client.failover.proxy.provider.mycluster'=
          'org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider',
    'dfs.ha.automatic-failover.enabled.ns1'= 'true',
    'dfs.ha.namenodes.ns1'= 'namenode1546333583,namenode1546336357',
    'dfs.namenode.rpc-address.ns1.namenode1546333583'='sar-uk-1.sar-uk.root.h.site:8020',
    'dfs.namenode.rpc-address.ns1.namenode1546336357'='sar-uk-2.sar-uk.root.h.site:8020',
    'dfs.namenode.http-address.ns1.namenode1546333583'=
          'sar-uk-1.sar-uk.root.h.site:20101',
    'dfs.namenode.http-address.ns1.namenode1546336357'='sar-uk-2.sar-uk.root.h.site:20101',
    'dfs.namenode.https-address.ns1.namenode1546336357'='sar-uk-2.sar-uk.root.h.site:20102',
    'dfs.namenode.https-address.ns1.namenode1546333583'='sar-uk-1.sar-uk.root.h.site:20102',
    'dfs.client.failover.proxy.provider.ns1'=
          'org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
    );