Best practices for performance tuning

Review certain performance tuning guidelines related to configuring the cluster, storing data, and writing queries so that you can protect your cluster and dependent services, disable user impersonation, automatically scale resources to handle queries, and so on.

Best practices

  • Adjust autoscaling in CDP Public Cloud to scale up when you need resources to handle queries.
  • Accept the default to use Tez as the execution engine. In CDP, the MapReduce execution engine is replaced by Tez.
  • Accept the default to disable user impersonation. If enabled, disable hive.server2.enable.doAs in hive-site.xml using the Cloudera Manager Safety Valve feature (see link below).

    LLAP caches data for multiple queries and this capability does not support user impersonation.

  • Use Ranger security service to protect your cluster and dependent services.
  • Store data using the ORC File format. Others, such as Parquet are supported, but not as fast for Hive queries.
  • Ensure that queries are fully vectorized by examining explain plans.