Best practices for a performant Hive
Before you tune Apache Hive, you should follow best practices. These guidelines include how you configure the cluster, store data, and write queries.
- Adjust autoscaling in CDP Public Cloud to scale up when you need resources to handle queries.
- Set up your cluster to use Tez as the execution engine. In CDP, the MapReduce execution engine is replaced by Tez.
- Disable user impersonation by configuring
hive.server2.enable.doAsin hive-site.xml using the Cloudera Manager Safety Valve feature (see link below).
LLAP caches data for multiple queries and this capability does not support user impersonation.
- Use Ranger security service to protect your cluster and dependent services.
- Store data using the ORC File format.
- Ensure that queries are fully vectorized by examining explain plans.
- Use the SmartSense tool to detect common system misconfigurations.