3. Set Up the Hive/HCatalog Configuration Files

Use the following instructions to set up the Hive/HCatalog configuration files:

  1. If you have not already done so, download and extract the HDP companion files.

    A sample  hive-site.xml  file is included in the  configuration_files/hive folder in the HDP companion files.

  2. Modify the configuration files.

    In the configuration_files/hive directory, edit the hive-site.xml file and modify the properties based on your environment. Search for TODO in the files for the properties to replace.

    1. Edit the connection properities for your Hive metastore database in hive-site.xml:

      <property>
          <name>javax.jdo.option.ConnectionURL</name>
          <value>jdbc:mysql://TODO-HIVE-METASTORE-DB-SERVER:TODO-HIVE-METASTORE-DB-PORT/TODO-HIVE-METASTORE-DB-NAME?createDatabaseIfNotExist=true</value>
          <description>Enter your Hive Metastore Connection URL, for example if MySQL: jdbc:mysql://localhost:3306/mysql?createDatabaseIfNotExist=true</description>    
        </property>
          
        <property>
          <name>javax.jdo.option.ConnectionUserName</name>
          <value>TODO-HIVE-METASTORE-DB-USER-NAME</value>
          <description>Enter your Hive Metastore database user name.</description>
        </property>
        
        <property>       
         <name>javax.jdo.option.ConnectionPassword</name>       
         <value>TODO-HIVE-METASTORE-DB-PASSWORD</value>  
         <description>Enter your Hive Metastore database password.</description>
        </property>
        
        <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>TODO-HIVE-METASTORE-DB-CONNECTION-DRIVER-NAME</value>
        <description>Enter your Hive Metastore Connection Driver Name, for example if MySQL: com.mysql.jdbc.Driver</description>
      </property>

      Optional: If you want storage-based authorization for Hive, set the following Hive authorization parameters in the hive-site.xml file:

      <property>
        <name>hive.security.authorization.enabled</name>
        <value>true</value>
      </property>
        
      <property>
        <name>hive.security.authorization.manager</name>
        <value>org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider</value>
      </property>
      
      <property>
        <name>hive.security.metastore.authorization.manager</name>
        <value>org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider</value>
      </property>
      
      <property>
        <name>hive.security.authenticator.manager</name>
        <value>org.apache.hadoop.hive.ql.security.ProxyUserAuthenticator</value>
      </property>

      Hive also supports SQL standard authorization. See Hive Authorization for more information about Hive authorization models.

      For a remote Hive metastore database, use the following hive-site.xml property value to set the IP address (or fully-qualified domain name) and port of the metastore host.

      <property>       
       <name>hive.metastore.uris</name>       
       <value>thrift://$metastore.server.full.hostname:9083</value>  
       <description>URI for client to contact metastore server. To enable HiveServer2, leave the property value empty. </description>
      </property>

      To enable HiveServer2 for remote Hive clients, assign a value of a single empty space to this property. Hortonworks recommends using an embedded instance of the Hive Metastore with HiveServer2. An embedded metastore runs in the same process with HiveServer2 rather than as a separate daemon. You can also configure HiveServer2 to use an embedded metastore instance from the command line:

      hive --service hiveserver2 -hiveconf hive.metastore.uris=" "

      Optional: By default, Hive ensures that column names are unique in query results returned for SELECT statements by prepending column names with a table alias. Administrators who do not want a table alias prefix to table column names can disable this behavior by setting the following configuration property:

      <property>
        <name>hive.resultset.use.unique.column.names</name>
        <value>false</value>
      </property>
    [Warning]Warning

    Hortonworks recommends that deployments disable the DataNucleus cache by setting the value of the datanucleus.cache.level2.type configuration parameter to none. Note that the datanucleus.cache.level2 configuration parameter is ignored, and assigning a value of none to this parameter will not have the desired effect.

    [Note]Note

    You can also use the HDP utility script to fine-tune memory configuration settings based on node hardware specifications.

  3. Copy the configuration files.

    1. On all Hive hosts create the Hive configuration directory.

      rm -r $HIVE_CONF_DIR ;
      mkdir -p $HIVE_CONF_DIR ;
    2. Copy all the configuration files to $HIVE_CONF_DIR directory.

    3. Set appropriate permissions:

      chown -R $HIVE_USER:$HADOOP_GROUP $HIVE_CONF_DIR/../ ;
      chmod -R 755 $HIVE_CONF_DIR/../ ;

    where:

    • $HIVE_CONF_DIR is the directory to store the Hive configuration files. For example, /etc/hive/conf.

    • $HIVE_USER is the user owning the Hive services. For example, hive.

    • $HADOOP_GROUP is a common group shared by services. For example, hadoop.

 3.1. Configure Hive and HiveServer2 for Tez

The hive-site.xml file in the HDP companion files includes the settings for Hive and HiveServer2 for Tez.

If you have already configured the hive-site.xmlconnection properities for your Hive metastore database as described in the previous section, the only remaining task would be to adjust the hive.tez.container.size and hive.tez.java.opts values as described in the following section. You can also use the HDP utility script to calculate these Tez memory configuration settings.

 3.1.1. Hive-on-Tez Configuration Parameters

Apart from the configurations generally recommended for Hive and HiveServer2 and included in the hive-site.xml file in the HDP companion files, for a multi-tenant use-case, only the following configurations are required in the hive-site.xml configuration file to configure Hive for use with Tez.

 

Table 9.1. Hive-Related Configuration Parameters

Configuration ParameterDescriptionDefault Value
hive.execution.engineThis setting determines whether Hive queries will be executed using Tez or MapReduce. If this value is set to "mr", Hive queries will be executed using MapReduce. If this value is set to "tez", Hive queries will be executed using Tez. All queries executed through HiveServer2 will use the specified hive.execution.engine setting.
hive.tez.container.sizeThe memory (in MB) to be used for Tez tasks. If this is not specified (-1), the memory settings from the MapReduce configurations (mapreduce.map.memory.mb)will be used by default for map tasks. -1(not specified) If this is not specified, the memory settings from the MapReduce configurations (mapreduce.map.memory.mb)will be used by default.
hive.tez.java.optsJava command line options for Tez. If this is not specified, the MapReduce java opts settings (mapreduce.map.java.opts) will be used by default for map tasks.If this is not specified, the MapReduce java opts settings (mapreduce.map.java.opts) will be used by default.
hive.server2.tez.default.queuesA comma-separated list of queues configured for the cluster.The default value is an empty string, which prevents execution of all queries. To enable query execution with Tez for HiveServer2, this parameter must configured.
hive.server2.tez.sessions.per.default.queueThe number of sessions for each queue named in the hive.server2.tez.default.queues.1 Larger clusters may improve performance of HiveServer2 by increasing this number.
hive.server2.tez.initialize.default.sessionsEnables a user to use HiveServer2 without enabling Tez for HiveServer2. Users may potentially may want to run queries with Tez without a pool of sessions.false
hive.server2.enable.doAsRequired when the queue-related configurations above are used.false

Examples of Hive-Related Configuration Properties:

  <property>
    <name>hive.execution.engine</name>
    <value>tez</value>
  </property>
  <property>
    <name>hive.tez.container.size</name>
    <value>-1</value>
    <description>Memory in mb to be used for Tez tasks. If this is not specified (-1) then the memory settings for map tasks will be used from mapreduce configuration</description>
  </property>
 
  <property>
    <name>hive.tez.java.opts</name>
    <value></value>
    <description>Java opts to be specified for Tez tasks. If this is not specified then java opts for map tasks will be used from mapreduce configuration</description>
  </property>
  
  <property>
    <name>hive.server2.tez.default.queues</name>
    <value>default</value>
  </property>
  
  <property>
    <name>hive.server2.tez.sessions.per.default.queue</name>
    <value>1</value>
  </property>
  
  <property>
    <name>hive.server2.tez.initialize.default.sessions</name>
    <value>false</value>
  </property>
  
  <property>
    <name>hive.server2.enable.doAs</name>
    <value>false</value>
  </property>

[Note]Note

Users running HiveServer2 in data analytic tools such as Tableau must reconnect to HiveServer2 after switching between the Tez and MapReduce execution engines.

[Tip]Tip

You can retrieve a list of queues by executing the following command: hadoop queue -list.

 3.1.2. Using Hive-on-Tez with Capacity Scheduler

You can use the tez.queue.name property to specify which queue will be used for Hive-on-Tez jobs. You can also set this property in the Hive shell, or in a Hive script. For more details, see "Configuring Tez with the Capacity Scheduler" on this page.


loading table of contents...