Configure Hive and HiveServer2 for Tez
The hive-site.xml file in the HDP companion files includes the settings for Hive and HiveServer2 for Tez.
If you have already configured the hive-site.xmlconnection properities for your Hive metastore database, the only remaining task would be to adjust hive.tez.container.size and hive.tez.java.opts values as described in the following section. You can also use the HDP utility script described earlier in this guide to calculate these Tez memory configuration settings.
Hive-on-Tez Configuration Parameters
Apart from the configurations generally recommended for Hive and HiveServer2 and included in the hive-site.xml file in the HDP companion files, for a multi-tenant use case, only the following configurations are required in the hive-site.xml configuration file to configure Hive for use with Tez.
Table 9.1. Hive Configuration Parameters
Configuration Parameter | Description | Default Value |
---|---|---|
hive.execution.engine | This setting determines whether Hive queries will be executed using Tez or MapReduce. | If this value is set to "mr," Hive queries will be executed using MapReduce. If this value is set to "tez," Hive queries will be executed using Tez. All queries executed through HiveServer2 will use the specified hive.execution.engine setting. |
hive.tez.container.size | The memory (in MB) to be used for Tez tasks. | -1 (not specified) If this is not specified, the memory settings from the MapReduce configurations (mapreduce.map.memory.mb) will be used by default for map tasks. |
hive.tez.java.opts | Java command line options for Tez. | If this is not specified, the MapReduce java opts settings (mapreduce.map.java.opts) will be used by default. |
hive.server2.tez.default.queues | A comma-separated list of queues configured for the cluster. | The default value is an empty string, which prevents execution of all queries. To enable query execution with Tez for HiveServer2, this parameter must be configured. |
hive.server2.tez.sessions. per.default.queue | The number of sessions for each queue named in the hive.server2.tez.default.queues. | 1; Larger clusters may improve performance of HiveServer2 by increasing this number. |
hive.server2.tez.initialize.default. sessions | Enables a user to use HiveServer2 without enabling Tez for HiveServer2. Users may potentially may want to run queries with Tez without a pool of sessions. | false |
hive.server2.enable.doAs | Required when the queue-related configurations above are used. | false |
Examples of Hive-Related Configuration Properties:
<property> <name>hive.execution.engine</name> <value>tez</value> </property> <property> <name>hive.tez.container.size</name> <value>-1</value> <description>Memory in mb to be used for Tez tasks. If this is not specified (-1) then the memory settings for map tasks will be used from mapreduce configuration</description> </property> <property> <name>hive.tez.java.opts</name> <value></value> <description>Java opts to be specified for Tez tasks. If this is not specified then java opts for map tasks will be used from mapreduce configuration</description> </property> <property> <name>hive.server2.tez.default.queues</name> <value>default</value> </property> <property> <name>hive.server2.tez.sessions.per.default.queue</name> <value>1</value> </property> <property> <name>hive.server2.tez.initialize.default.sessions</name> <value>false</value> </property> <property> <name>hive.server2.enable.doAs</name> <value>false</value> </property>
Note | |
---|---|
Users running HiveServer2 in data analytic tools such as Tableau must reconnect to HiveServer2 after switching between the Tez and MapReduce execution engines. You can retrieve a list of queues by executing the following command: hadoop queue -list. |
Using Hive-on-Tez with Capacity Scheduler
You can use the tez.queue.name property to specify which queue will be used for Hive- on-Tez jobs. You can also set this property in the Hive shell, or in a Hive script.