Partition metadata replication takes a long time to complete

How can partition metadata replication be improved when the Hive tables use several Hive partitions?

Hive metadata replication process takes a long time to complete when the Hive tables use several Hive partitions. This is because the Hive partition parameters are compared during the import stage of the partition metadata replication process and if the exported and existing partition parameters do not match, the partition is dropped and recreated. You can configure a key-value pair to support partition metadata replication.

  1. Go to the Cloudera Manager > Clusters > Hive service > Configuration tab.
  2. Search for the Hive Replication Environment Advanced Configuration Snippet (Safety Valve) property.
  3. Enter the HIVE_IGNORED_PARTITION_PARAMETERS=[***COMMA SEPARATED LIST OF HIVE PARTITION PARAMETERS***] key-value pair.
    For example,
    HIVE_IGNORED_PARTITION_PARAMETERS=transient_lastDdlTime,totalSize,numRows,COLUMN_STATS_ACCURATE,numFiles

    The partition parameter names you provide are not compared during the import stage of the partition metadata replication process. Therefore, even if the partition parameters do not match between the exported and existing partitions, the partition is not dropped or recreated. After you configure this key-value pair, the import stage of the partition metadata replication process completes faster.

  4. Save the changes, and restart the Hive service.