Managing high partition workloads

Condition

If you are running high partition workloads, such as a table having 5000 partitions and 100 columns, you may notice the following error when inserting data from a source table to the destination table:
ERROR : FAILED: Execution Error, return code 40000 from org.apache.hadoop.hive.ql.exec.MoveTask.
MetaException(message:One or more instances could not be made persistent)

Cause

The issue occurs because the queries are run one at a time and there is no query concurrency. Learn how you can configure the HiveServer (HS2) and Hive Metastore (HMS) services with the recommended values to prevent errors while inserting data into high partition workloads.

Solution

  1. Log in to Cloudera Manager as an administrator.
  2. Go to Clusters > Hive on Tez > Configuration and search for 'HiveServer2 Advanced Configuration Snippet (Safety Valve) for hive-site.xml'.
  3. Tune the following HS2 parameters by setting the recommended values.
    hive.optimize.sort.dynamic.partition.threshold=0;
    hive.thrift.client.max.message.size=2147483647;
    hive.metastore.client.skip.columns.for.partitions=true;
    hive.stats.autogather=false;
    hive.stats.column.autogather=false;
    hive.msck.repair.batch.size=200;

    If the properties are not available, click to add custom configurations and set the values.

  4. Save the changes and restart the Hive on Tez service.
  5. Go to Clusters > Hive > Configuration and search for 'Hive Metastore Server Advanced Configuration Snippet (Safety Valve) for hive-site.xml'.
  6. Tune the following HMS parameters by setting the recommended values.
    hive.metastore.direct.sql.batch.size=5000
    hive.txn.timeout=3600
    hive.metastore.try.direct.sql=true
    hive.metastore.try.direct.sql.ddl=true

    If the properties are not available, click to add custom configurations and set the values.

  7. Save the changes and restart the Hive service.
  8. Run your high partition workloads.