Chapter 6. Optimizing the Hive Execution Engine

To maximize the data analytics capabilities of applications that query Hive, you might need to tune the Apache Tez execution engine. Tez is an advancement over earlier application frameworks for Hadoop data processing, such as MapReduce2 and MapReduce1. The Tez framework is required for high-performance batch workloads and for all interactive applications.

Explain Plans

When you use Hive for interactive queries, you can generate explain plans. An explain plan shows you the execution plan of a query by revealing the series of operations that occur when a particular query is run. By understanding the plan, you can determine if you want to adjust your application development.

For example, an explain plan might help you see why the query optimizer runs a query with a shuffle operation instead of a hash JOIN. With this knowledge, you might want to rewrite queries in the application so that they better align with user goals and the environment.

Hive in HDP can generate two types of explain plans. A textual plan, such as information printed in a CLI query editor, displays the execution plan in descriptive lines. A graphical plan, such as the Visual Explain feature of Hive Views in Ambari, shows the execution plan as a flow diagram. Learn more about Visual Explain Plans in the Query Tab documentation for Hive View 2.0.

Tuning the Execution Engine Manually

If you encounter subpar performance of your Hive queries after debugging them with Tez View and Hive View, then you might need to adjust Tez Service configuration properties.

Tune Tez Service Configuration Properties

About this Task

	Important
	Check and adjust the following property settings only if you think these execution engine properties degrade the performance of Hive LLAP queries.

Advanced users: If you want to add or configure a property that is not listed in the table below, open the Custom tez-site section of the Configs tab to enter or edit the custom property.

Steps

In Ambari, open Services > Tez > Configs tab.

Use the following table as a reference checklist.

	Tip
	Ambari automatically customizes the value for the `tez.am.resource.memory.mb` property to suit your cluster profile. Generally, you should not change the default value of this property at this stage if you are not changing resources on the cluster.

You can view the properties by either of these methods:
Type each property name in the Filter field in the top right corner.
Open the General, Advanced tez-env, etc., sections and scan the lists of each category.
Click Save.
If prompted to restart, restart the Tez Service.

Table 6.1. Settings for Execution Engine Properties

Property	Setting Guideline If Manual Configuration Is Needed	Default Value in Ambari
`tez.am.resource.memory.mb`	4 GB maximum for most sites	Depends on your environment
`tez.session.am.dag.submit.` `timeout.secs`	`300` minimum	`300`
`tez.am.container.idle.` `release-timeout-min.millis`	`20000` minimum	`10000`
`tez.am.container.idle.` `release-timeout-max.millis`	`40000` minimum	`20000`
`tez.shuffle-vertex-manager.desired-task-input-size`	Increase for large ETL jobs that run too long	No default value set
`tez.min.partition.factor`	Increase for more reducers Decrease for fewer reducers	`0.25`
`tez.max.partition.factor`	Increase for more reducers Decrease for fewer reducers	`2.0`
`tez.shuffle-vertex-manager.min-task-parallelism`	Set a value if reducer counts are too low, even if the `tez.shuffle-vertex-manager.min-src-fraction` property is already adjusted	No default value set
`tez.shuffle-vertex-manager.min-src-fraction`	Increase to start reducers later Decrease to start reducers sooner	`0.2`
`tez.shuffle-vertex-manager.max-src-fraction`	Increase to start reducers later Decrease to start reducers sooner	`0.4`
`hive.vectorized.` `execution.enabled`	`true`	`0.4`
`hive.mapjoin.hybridgrace.` `hashtable`	`true` for slower but safer processing `false` for faster processing	`false`