Configure Hive and HiveServer2 for Tez

The hive-site.xml file in the HDP companion files includes the settings for Hive and HiveServer2 for Tez.

If you have already configured the hive-site.xmlconnection properties for your Hive metastore database, the only remaining task would be to adjust hive.tez.container.size and hive.tez.java.opts values as described in the following section. You can also use the HDP utility script described earlier in this guide to calculate these Tez memory configuration settings.

Hive-on-Tez Configuration Parameters

Apart from the configurations generally recommended for Hive and HiveServer2 and included in the hive-site.xml file in the HDP companion files, for a multi-tenant use case, only the following configurations are required in the hive-site.xml configuration file to configure Hive for use with Tez.

Table 9.1. Hive Configuration Parameters

Configuration Parameter	Description	Default Value
hive.execution.engine	This setting determines whether Hive queries will be executed using Tez or MapReduce.	If this value is set to "mr," Hive queries will be executed using MapReduce. If this value is set to "tez," Hive queries will be executed using Tez. All queries executed through HiveServer2 will use the specified hive.execution.engine setting.
hive.tez.container.size	The memory (in MB) to be used for Tez tasks.	-1 (not specified) If this is not specified, the memory settings from the MapReduce configurations (mapreduce.map.memory.mb) will be used by default for map tasks.
hive.tez.java.opts	Java command line options for Tez.	If this is not specified, the MapReduce java opts settings (mapreduce.map.java.opts) will be used by default.
hive.server2.tez.default.queues	A comma-separated list of queues configured for the cluster.	The default value is an empty string, which prevents execution of all queries. To enable query execution with Tez for HiveServer2, this parameter must be configured.
hive.server2.tez.sessions. per.default.queue	The number of sessions for each queue named in the hive.server2.tez.default.queues.	1; Larger clusters might improve performance of HiveServer2 by increasing this number.
hive.server2.tez.initialize.default. sessions	Enables a user to use HiveServer2 without enabling Tez for HiveServer2. Users might potentially want to run queries with Tez without a pool of sessions.	false
hive.server2.enable.doAs	Required when the queue-related configurations above are used.	false

Examples of Hive-Related Configuration Properties:

<property>
     <name>hive.execution.engine</name>
     <value>tez</value>
</property>
 
<property>
     <name>hive.tez.container.size</name>
     <value>-1</value>
     <description>Memory in mb to be used for Tez tasks. If this is not specified (-1)
     then the memory settings for map tasks will be used from mapreduce configuration</description>
</property>
 
<property>
     <name>hive.tez.java.opts</name>
     <value></value>
     <description>Java opts to be specified for Tez tasks. If this is not specified
     then java opts for map tasks will be used from mapreduce configuration</description>
</property>
 
<property>
     <name>hive.server2.tez.default.queues</name>
     <value>default</value>
</property>
 
<property>
     <name>hive.server2.tez.sessions.per.default.queue</name>
     <value>1</value>
</property>
 
<property>
     <name>hive.server2.tez.initialize.default.sessions</name>
     <value>false</value>
</property>
 
<property>
     <name>hive.server2.enable.doAs</name>
     <value>false</value>
</property>

	Note
Users running HiveServer2 in data analytic tools such as Tableau must reconnect to HiveServer2 after switching between the Tez and MapReduce execution engines. You can retrieve a list of queues by executing the following command: hadoop queue -list.

Note

Users running HiveServer2 in data analytic tools such as Tableau must reconnect to HiveServer2 after switching between the Tez and MapReduce execution engines.

You can retrieve a list of queues by executing the following command: hadoop queue -list.

Using Hive-on-Tez with Capacity Scheduler

You can use the tez.queue.name property to specify which queue will be used for Hive-on-Tez jobs. You can also set this property in the Hive shell, or in a Hive script.