Best practices for a performance tuning

Before you tune Apache Hive, you should follow best practices. These guidelines include how you configure the cluster, store data, and write queries.

Best practices

  • Adjust autoscaling in CDP Public Cloud to scale up when you need resources to handle queries.
  • Accept the default to use Tez as the execution engine. In CDP, the MapReduce execution engine is replaced by Tez.
  • Accept the default to disable user impersonation. If enabled, disable hive.server2.enable.doAs in hive-site.xml using the Cloudera Manager Safety Valve feature (see link below).

    LLAP caches data for multiple queries and this capability does not support user impersonation.

  • Use Ranger security service to protect your cluster and dependent services.
  • Store data using the ORC File format. Others, such as Parquet are supported, but not as fast for Hive queries.
  • Ensure that queries are fully vectorized by examining explain plans.