Configuring replication policies to run in parallel

You can run multiple Hive replication policies in parallel to optimize performance when copying data from the source cluster to the target cluster.

You can run parallel replication policies equal to the number of available cores in the source cluster. You set the following properties:
  • hive.exec.parallel

    Set this property at the policy level.

  • hive.exec.parallel.thread.number

    Set this property at the Hive session level.

Hive doesn't support concurrency. You cannot set hive.exec.parallel at the global level. Setting this property at the session level affects only the replication policy. If hive.exec.parallel is disabled, other hive queries do not run in parallel. The hive.exec.parallel.thread.number is not supported at the policy/query level. You can set at the global level or client session level. The thread count will not take effect unless hive.exec.parallel is set to true. The Copy operations, mainly, will be parallelised. DDL are sequential. You can have 1000s of tables/partitions, so the data copy runs in parallel with this configuration. These configurations enhance the bootstrap performance significantly because partitions of same table are batched for the data copy that occurs in parallel.

Minimum required role: Replication Administrator or Full Administrator
  1. On the source cluster, set hive.exec.parallel to true.
  2. Set the hive.exec.parallel.thread.number equal to the number of cores at the session level.
    Set hive.exec.parallel.thread.number=128
    REPL LOAD [***database name***] FROM  [***directory name***]  WITH ('hive.exec.parallel'='true'')