6. Configure Tez for Hive

If your cluster properties file specifies IS_TEZ=yes (use Tez for Hive), perform the following steps after HDP deployment:

  1. Open the command prompt with the hadoop account:

    runas /user:hadoop cmd 

  2. Make a Tez application directory in HDFS:

    %HADOOP_HOME%\bin\hdfs dfs -mkdir /apps/tez 

  3. Allow all users read and write access:

    %HADOOP_HOME%\bin\hdfs dfs -chmod -R 755 /apps/tez 

  4. Change the owner of the file to hadoop:

    %HADOOP_HOME%\bin\hdfs dfs -chown -R hadoop:users /apps/tez

  5. Copy the Tez home directory on the local machine, into the HDFS /apps/tez directory:

    %HADOOP_HOME%\bin\hdfs dfs -put %TEZ_HOME%\* /apps/tez
                

  6. Remove the Tez configuration directory from the HDFS Tez application directory:

    %HADOOP_HOME%\bin\hdfs dfs -rm -r -skipTrash /apps/tez/conf
                

  7. Ensure that the following properties are set in the %HIVE_HOME%\conf\hive- site.xml file:

     

    Table 4.1. Required properties

    Property

    Default Value

    Description

    hive.auto.convert.join. noconditionaltask

    true

    Specifies whether Hive optimizes converting common JOIN statements into MAPJOIN statements. JOIN statements are converted if this property is enabled and the sum of size for n-1 of the tables/partitions for an n-way join is smaller than the size specified with the hive.auto.convert.join. noconditionaltask.size property.

    hive.auto.convert.join. noconditionaltask.size

    10000000 (10 MB)

    Specifies the size used to calculate whether Hive converts a JOIN statement into a MAPJOIN statement. The configuration property is ignored unless hive.auto.convert.join. noconditionaltask is enabled.

    hive.optimize. reducededuplication. min.reducer

    4

    Specifies the minimum reducer parallelism threshold to meet before merging two MapReduce jobs. However, combining a mapreduce job with parallelism 100 with a mapreduce job with parallelism 1 may negatively impact query performance even with the reduced number of jobs. The optimization is disabled if the number of reducers is less than the specified value.

    hive.tez.container.size

    -1

    By default, Tez uses the java options from map tasks. Use this property to override that value. Assigned value must match value specified for mapreduce.map.child.java.opts.

    hive.tez.java.opts

    n/a

    Set to the same value as mapreduce.map.java.opts.


    Adjust the settings above to your environment where appropriate; hive-default.xml.template contains examples of the properties.

  8. To verify that the installation process succeeded, run smoke tests for Tez and Hive.


loading table of contents...