3. Setting Up the Hive/HCatalog Configuration Files

Use the following instructions to set up the Hive/HCatalog configuration files:

  1. If you have not already done so, download and extract the HDP companion files. (See "Downloading the Companion Files" in Chapter 1 of this guide.)

    A sample hive-site.xml file is included in the configuration_files/hive folder in the HDP companion files.

  2. Modify the configuration files.

    In the configuration_files/hive directory, edit the hive-site.xml file and modify the properties based on your environment. Search for TODO variables in the files for the properties to replace.

    Edit the connection properties for your Hive metastore database in hive-site.xml:

    <property>
         <name>javax.jdo.option.ConnectionURL</name>
         <value>jdbc:mysql://TODO-HIVE-METASTORE-DB-SERVER:TODO-HIVE-METASTORE-DB-PORT/TODO-HIVE-METASTORE-DB-NAME?createDatabaseIfNotExist=true</value>
         <description>Enter your Hive Metastore Connection URL, for example if MySQL: jdbc:mysql://localhost:3306/mysql?createDatabaseIfNotExist=true</description> 
    </property>
     
    <property>
         <name>javax.jdo.option.ConnectionUserName</name>
         <value>TODO-HIVE-METASTORE-DB-USER-NAME</value>
         <description>Enter your Hive Metastore database user name.</description>
    </property>
     
    <property> 
         <name>javax.jdo.option.ConnectionPassword</name> 
         <value>TODO-HIVE-METASTORE-DB-PASSWORD</value> 
         <description>Enter your Hive Metastore database password.</description>
    </property>
     
    <property>
         <name>javax.jdo.option.ConnectionDriverName</name>
         <value>TODO-HIVE-METASTORE-DB-CONNECTION-DRIVER-NAME</value>
         <description>Enter your Hive Metastore Connection Driver Name, for example if MySQL: com.mysql.jdbc.Driver</description>
    </property>
    [Warning]Warning

    To prevent memory leaks in unsecure mode, disable file system caches by setting the following parameters to true in hive-site.xml:

    • fs.hdfs.impl.disable.cache

    • fs.file.impl.disable.cache

  3. (Optional) If you want storage-based authorization for Hive, set the following Hive authorization parameters in the hive-site.xml file:

    <property>
         <name>hive.metastore.pre-event.listeners</name>
         <value>org.apache.hadoop.hive.ql.security.authorization.AuthorizationPreEventListener</value>
    </property>
    
    <property>
         <name>hive.security.metastore.authorization.manager</name>
         <value>org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider</value>
    </property>
    
    <property>
         <name>hive.security.authenticator.manager</name>
         <value>org.apache.hadoop.hive.ql.security.ProxyUserAuthenticator</value>
    </property>

    Hive also supports SQL standard authorization. See "Hive Authorization" for more information about Hive authorization models.

  4. For a remote Hive metastore database, use the following hive-site.xml property value to set the IP address (or fully-qualified domain name) and port of the metastore host.

    <property> 
         <name>hive.metastore.uris</name> 
         <value>thrift://$metastore.server.full.hostname:9083</value> 
         <description>URI for client to contact metastore server. To enable HiveServer2, leave the property value empty.     
         </description>
    </property>

    To enable HiveServer2 for remote Hive clients, assign a value of a single empty space to this property. Hortonworks recommends using an embedded instance of the Hive Metastore with HiveServer2. An embedded metastore runs in the same process with HiveServer2 rather than as a separate daemon. You can also configure HiveServer2 to use an embedded metastore instance from the command line:

    hive --service hiveserver2 -hiveconf hive.metastore.uris=""

  5. (Optional) By default, Hive ensures that column names are unique in query results returned for SELECT statements by prepending column names with a table alias. Administrators who do not want a table alias prefix to table column names can disable this behavior by setting the following configuration property:

    <property>     <name>hive.resultset.use.unique.column.names</name>     <value>false</value> </property>

    [Important]Important

    Hortonworks recommends that deployments disable the DataNucleus cache by setting the value of the datanucleus.cache.level2.type configuration parameter to none. Note that the datanucleus.cache.level2 configuration parameter is ignored, and assigning a value of none to this parameter will not have the desired effect.


loading table of contents...