Use the following instructions to set up the Hive/HCatalog configuration files:
If you have not already done so, download and extract the HDP companion files.
A sample hive-site.xml file is included in the
configuration_files/hive
folder in the HDP companion files.Modify the configuration files.
In the
configuration_files/hive
directory, edit thehive-site.xml
file and modify the properties based on your environment. Search forTODO
in the files for the properties to replace.Edit the connection properities for your Hive metastore database in
hive-site.xml
:<property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://TODO-HIVE-METASTORE-DB-SERVER:TODO-HIVE-METASTORE-DB-PORT/TODO-HIVE-METASTORE-DB-NAME?createDatabaseIfNotExist=true</value> <description>Enter your Hive Metastore Connection URL, for example if MySQL: jdbc:mysql://localhost:3306/mysql?createDatabaseIfNotExist=true</description> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>TODO-HIVE-METASTORE-DB-USER-NAME</value> <description>Enter your Hive Metastore database user name.</description> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>TODO-HIVE-METASTORE-DB-PASSWORD</value> <description>Enter your Hive Metastore database password.</description> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>TODO-HIVE-METASTORE-DB-CONNECTION-DRIVER-NAME</value> <description>Enter your Hive Metastore Connection Driver Name, for example if MySQL: com.mysql.jdbc.Driver</description> </property>
Optional: If you want storage-based authorization for Hive, set the following Hive authorization parameters in the
hive-site.xml
file:<property> <name>hive.security.authorization.enabled</name> <value>true</value> </property> <property> <name>hive.security.authorization.manager</name> <value>org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider</value> </property> <property> <name>hive.security.metastore.authorization.manager</name> <value>org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider</value> </property> <property> <name>hive.security.authenticator.manager</name> <value>org.apache.hadoop.hive.ql.security.ProxyUserAuthenticator</value> </property>
Hive also supports SQL standard authorization. See Hive Authorization for more information about Hive authorization models.
For a remote Hive metastore database, use the following
hive-site.xml
property value to set the IP address (or fully-qualified domain name) and port of the metastore host.<property> <name>hive.metastore.uris</name> <value>thrift://$metastore.server.full.hostname:9083</value> <description>URI for client to contact metastore server. To enable HiveServer2, leave the property value empty. </description> </property>
To enable HiveServer2 for remote Hive clients, assign a value of a single empty space to this property. Hortonworks recommends using an embedded instance of the Hive Metastore with HiveServer2. An embedded metastore runs in the same process with HiveServer2 rather than as a separate daemon. You can also configure HiveServer2 to use an embedded metastore instance from the command line:
hive --service hiveserver2 -hiveconf hive.metastore.uris=" "
Optional: By default, Hive ensures that column names are unique in query results returned for SELECT statements by prepending column names with a table alias. Administrators who do not want a table alias prefix to table column names can disable this behavior by setting the following configuration property:
<property> <name>hive.resultset.use.unique.column.names</name> <value>false</value> </property>
Warning Hortonworks recommends that deployments disable the DataNucleus cache by setting the value of the
datanucleus.cache.level2.type
configuration parameter tonone
. Note that thedatanucleus.cache.level2
configuration parameter is ignored, and assigning a value ofnone
to this parameter will not have the desired effect.Note You can also use the HDP utility script to fine-tune memory configuration settings based on node hardware specifications.
Copy the configuration files.
On all Hive hosts create the Hive configuration directory.
rm -r $HIVE_CONF_DIR ; mkdir -p $HIVE_CONF_DIR ;
Copy all the configuration files to
$HIVE_CONF_DIR
directory.Set appropriate permissions:
chown -R $HIVE_USER:$HADOOP_GROUP $HIVE_CONF_DIR/../ ; chmod -R 755 $HIVE_CONF_DIR/../ ;
where:
$HIVE_CONF_DIR
is the directory to store the Hive configuration files. For example,/etc/hive/conf
.$HIVE_USER
is the user owning the Hive services. For example,hive
.$HADOOP_GROUP
is a common group shared by services. For example,hadoop
.
The hive-site.xml
file in the HDP companion files includes the settings for Hive and HiveServer2 for Tez.
If you have already configured the hive-site.xml
connection
properities for your Hive metastore database as described in the previous section, the only remaining task would be to adjust the
hive.tez.container.size
and hive.tez.java.opts
values as
described in the following section. You can also use the HDP utility script to calculate these Tez memory configuration settings.
Apart from the configurations generally recommended for Hive and HiveServer2 and
included in the hive-site.xml
file in the HDP companion files, for a
multi-tenant use-case, only the following configurations are required in the
hive-site.xml
configuration file to configure Hive for use with Tez.
Table 9.1. Hive-Related Configuration Parameters
Configuration Parameter | Description | Default Value |
---|---|---|
hive.execution.engine | This setting determines whether Hive queries will be executed using Tez or MapReduce. | If this value is set to "mr", Hive queries will be executed using MapReduce.
If this value is set to "tez", Hive queries will be executed using Tez. All
queries executed through HiveServer2 will use the specified
hive.execution.engine setting. |
hive.tez.container.size | The memory (in MB) to be used for Tez tasks. If this is not specified (-1),
the memory settings from the MapReduce configurations
(mapreduce.map.memory.mb )will be used by default for map tasks. | -1 (not specified) If this is not specified, the memory settings
from the MapReduce configurations (mapreduce.map.memory.mb )will be
used by default. |
hive.tez.java.opts | Java command line options for Tez. If this is not specified, the MapReduce
java opts settings (mapreduce.map.java.opts ) will be used by default
for map tasks. | If this is not specified, the MapReduce java opts settings
(mapreduce.map.java.opts ) will be used by default. |
hive.server2.tez.default.queues | A comma-separated list of queues configured for the cluster. | The default value is an empty string, which prevents execution of all queries. To enable query execution with Tez for HiveServer2, this parameter must configured. |
hive.server2.tez.sessions.per.default.queue | The number of sessions for each queue named in the
hive.server2.tez.default.queues . | 1 Larger clusters may improve performance of
HiveServer2 by increasing this number. |
hive.server2.tez.initialize.default.sessions | Enables a user to use HiveServer2 without enabling Tez for HiveServer2. Users may potentially may want to run queries with Tez without a pool of sessions. | false |
hive.server2.enable.doAs | Required when the queue-related configurations above are used. | false |
Examples of Hive-Related Configuration Properties:
<property> <name>hive.execution.engine</name> <value>tez</value> </property> <property> <name>hive.tez.container.size</name> <value>-1</value> <description>Memory in mb to be used for Tez tasks. If this is not specified (-1) then the memory settings for map tasks will be used from mapreduce configuration</description> </property> <property> <name>hive.tez.java.opts</name> <value></value> <description>Java opts to be specified for Tez tasks. If this is not specified then java opts for map tasks will be used from mapreduce configuration</description> </property> <property> <name>hive.server2.tez.default.queues</name> <value>default</value> </property> <property> <name>hive.server2.tez.sessions.per.default.queue</name> <value>1</value> </property> <property> <name>hive.server2.tez.initialize.default.sessions</name> <value>false</value> </property> <property> <name>hive.server2.enable.doAs</name> <value>false</value> </property>
Note | |
---|---|
Users running HiveServer2 in data analytic tools such as Tableau must reconnect to HiveServer2 after switching between the Tez and MapReduce execution engines. |
Tip | |
---|---|
You can retrieve a list of queues by executing the following command: |
You can use the tez.queue.name
property to specify which queue will be
used for Hive-on-Tez jobs. You can also set this property in the Hive shell, or in a Hive
script. For more details, see "Configuring Tez with the Capacity Scheduler" on this page.