Chapter 6. Optimizing the Hive Execution Engine
To maximize the data analytics capabilities of applications that query Hive, you might need to tune the Apache Tez execution engine. Tez is an advancement over earlier application frameworks for Hadoop data processing, such as MapReduce2 and MapReduce1. The Tez framework is required for high-performance batch workloads and for all interactive applications.
Explain Plans
When you use Hive for interactive queries, you can generate explain plans. An explain plan shows you the execution plan of a query by revealing the series of operations that occur when a particular query is run. By understanding the plan, you can determine if you want to adjust your application development.
For example, an explain plan might help you see why the query optimizer runs a query with a shuffle operation instead of a hash JOIN. With this knowledge, you might want to rewrite queries in the application so that they better align with user goals and the environment.
Hive in HDP can generate two types of explain plans. A textual plan, such as information printed in a CLI query editor, displays the execution plan in descriptive lines. A graphical plan, such as the Visual Explain feature of Hive Views in Ambari, shows the execution plan as a flow diagram. Learn more about Visual Explain Plans in the Query Tab documentation for Hive View 2.0.
Tuning the Execution Engine Manually
If you encounter subpar performance of your Hive queries after debugging them with Tez View and Hive View, then you might need to adjust Tez Service configuration properties.
Tune Tez Service Configuration Properties
About this Task
Important | |
---|---|
Check and adjust the following property settings only if you think these execution engine properties degrade the performance of Hive LLAP queries. |
Advanced users: If you want to add or configure a property that is not listed in the table below, open the section of the Configs tab to enter or edit the custom property.
Steps
In Ambari, open
Services
>Tez
> Configs tab.Use the following table as a reference checklist.
Tip Ambari automatically customizes the value for the
tez.am.resource.memory.mb
property to suit your cluster profile. Generally, you should not change the default value of this property at this stage if you are not changing resources on the cluster.You can view the properties by either of these methods:
Type each property name in the Filter field in the top right corner. Open the General, Advanced tez-env, etc., sections and scan the lists of each category. Click
.If prompted to restart, restart the Tez Service.
Table 6.1. Settings for Execution Engine Properties
Property | Setting Guideline If Manual Configuration Is Needed | Default Value in Ambari |
---|---|---|
|
4 GB maximum for most sites | Depends on your environment |
| 300 minimum | 300 |
|
|
|
|
|
|
| Increase for large ETL jobs that run too long |
No default value set |
|
Increase for more reducers Decrease for fewer reducers |
|
|
Increase for more reducers Decrease for fewer reducers |
|
| Set a value if reducer counts are too low, even if the
tez.shuffle-vertex-manager.min-src-fraction property is
already adjusted |
No default value set |
tez.shuffle-vertex-manager.min-src-fraction | Increase to start reducers later Decrease to start reducers sooner | 0.2 |
tez.shuffle-vertex-manager.max-src-fraction |
Increase to start reducers later Decrease to start reducers sooner | 0.4 |
| true | 0.4 |
|
| false |