Configure Hive and HiveServer2 for Tez
The hive-site.xml
file in the HDP companion files includes the
settings for Hive and HiveServer2 for Tez.
If you have already configured the hive-site.xmlconnection properties for your Hive metastore database, the only remaining task would be to adjust hive.tez.container.size and hive.tez.java.opts values as described in the following section. You can also use the HDP utility script described earlier in this guide to calculate these Tez memory configuration settings.
Hive-on-Tez Configuration Parameters
Apart from the configurations generally recommended for Hive and HiveServer2
and included in the hive-site.xml
file in the HDP companion
files, for a multi-tenant use case, only the following configurations are
required in the hive-site.xml
configuration file to configure
Hive for use with Tez.
Table 9.1. Hive Configuration Parameters
Configuration Parameter |
Description |
Default Value |
---|---|---|
hive.execution.engine |
This setting determines whether Hive queries are executed using Tez or MapReduce. |
If this value is set to "mr," Hive queries are executed using MapReduce. If this value is set to "tez," Hive queries are executed using Tez. All queries executed through HiveServer2 use the specified hive.execution.engine setting. |
hive.tez.container.size |
The memory (in MB) to be used for Tez tasks. |
-1 (not specified) If this is not specified, the memory settings from the MapReduce configurations (mapreduce.map.memory.mb) are used by default for map tasks. |
hive.tez.java.opts |
Java command line options for Tez. |
If this is not specified, the MapReduce java opts settings (mapreduce.map.java.opts) are used by default. |
hive.server2.tez.default.queues |
A comma-separated list of queues configured for the cluster. |
The default value is an empty string, which prevents execution of all queries. To enable query execution with Tez for HiveServer2, this parameter must be configured. |
hive.server2.tez.sessions. per.default.queue |
The number of sessions for each queue named in the hive.server2.tez.default.queues. |
1; Larger clusters might improve performance of HiveServer2 by increasing this number. |
hive.server2.tez.initialize.default. sessions |
Enables a user to use HiveServer2 without enabling Tez for HiveServer2. Users might potentially want to run queries with Tez without a pool of sessions. |
false |
hive.server2.enable.doAs |
Required when the queue-related configurations above are used. |
false |
Examples of Hive-Related Configuration Properties:
<property> <name>hive.execution.engine</name> <value>tez</value> </property> <property> <name>hive.tez.container.size</name> <value>-1</value> <description>Memory in mb to be used for Tez tasks. If this is not specified (-1) then the memory settings for map tasks are used from mapreduce configuration</description> </property> <property> <name>hive.tez.java.opts</name> <value></value> <description>Java opts to be specified for Tez tasks. If this is not specified then java opts for map tasks are used from mapreduce configuration</description> </property> <property> <name>hive.server2.tez.default.queues</name> <value>default</value> </property> <property> <name>hive.server2.tez.sessions.per.default.queue</name> <value>1</value> </property> <property> <name>hive.server2.tez.initialize.default.sessions</name> <value>false</value> </property> <property> <name>hive.server2.enable.doAs</name> <value>false</value> </property>
Note | |
---|---|
Users running HiveServer2 in data analytic tools such as Tableau must reconnect to HiveServer2 after switching between the Tez and MapReduce execution engines. You can retrieve a list of queues by executing the following command: hadoop queue -list. |
Using Hive-on-Tez with Capacity Scheduler
You can use the tez.queue.name property to specify which queue is used for Hive-on-Tez jobs. You can also set this property in the Hive shell, or in a Hive script.