Configuring replication policies to run in parallel
You can run multiple Hive replication policies in parallel to optimize performance when copying data from the source cluster to the target cluster.
Set this property at the policy level.
Set this property at the Hive session level.
Hive doesn't support concurrency. You cannot set hive.exec.parallel at the global level. Setting this property at the session level affects only the replication policy. If hive.exec.parallel is disabled, other hive queries do not run in parallel. The hive.exec.parallel.thread.number is not supported at the policy/query level. You can set at the global level or client session level. The thread count will not take effect unless hive.exec.parallel is set to true. The Copy operations, mainly, will be parallelised. DDL are sequential. You can have 1000s of tables/partitions, so the data copy runs in parallel with this configuration. These configurations enhance the bootstrap performance significantly because partitions of same table are batched for the data copy that occurs in parallel.
On the source cluster, set
hive.exec.parallel.thread.numberequal to the number of cores at the session level.
Set hive.exec.parallel.thread.number=128 REPL LOAD [***database name***] FROM [***directory name***] WITH ('hive.exec.parallel'='true'')