3.1. Enable Tez Service

[Warning]Warning

These instructions are now obsolete and no longer provide a Supported version of Tez. Use the information found in HDP 2.1 regarding Installing and Configuring Tez.

  1. Create directories and configure ownership + permissions on the appropriate hosts as described below. If any of these directories already exist, we recommend deleting and recreating them.

    On all the client nodes, create the following directory:

    1. Create Hadoop configuration directory for Tez. For example, /etc/hadoop-tez/conf/.

      mkdir -p $HADOOP_TEZ_DIR

    2. Copy the contents from the Hadoop configuration directory (etc/hadoop/conf) to the Hadoop-Tez configuration directory.

    3. Set appropriate permissions for the Hadoop-Tez configuration directory.

      For example, if the Hive user is responsible for submitting the queries to Tez Service, the permissions should be set as shown below:

      chown -R $HIVE_USER:$HADOOP_GROUP $HADOOP_TEZ_DIR;
      chmod -R 755 $HADOOP_TEZ_DIR;

      where:

      • $HADOOP_TEZ_DIR is the Hadoop configuration directoy for Tez. For example, /etc/hadoop-tez/conf/.

      • $HIVE_USER is the user owning the Hive services. For example, hive.

      • $HADOOP_GROUP is a common group shared by services. For example, hadoop.

  2. Enable Tez AM using the instructions provided here.

  3. Enable Tez Service for Hive.

    1. Create a directory to store the Hive JAR files (for example, /apps/hive/tez-ampool-jars).

      hadoop dfs -mkdir -p $HIVE_JAR_DIR
      hadoop dfs -put $HIVE_HOME/lib/hive*.jar $HIVE_JAR_DIR

      Set appropriate permissions for the Tez Service user. For example, if the Hive user is responsible for submitting the queries to Tez Service, the permissions should be set as shown below:

      hadoop fs chown -R $HIVE_USER:$HADOOP_GROUP $HIVE_JAR_DIR;
      hadoop fs chmod -R 755 $HIVE_JAR_DIR;

      where:

      • $HIVE_JAR_DIR is the directory that contains Hive JAR files. For example, /apps/hive/tez-ampool-jars and is used by the tez.ampool.mr-am.job-jar-path property.

        [Note]Note

        User submitting jobs should have appropriate access permissions to the files listed in tez.ampool.mr-am.job-jar-path property.

      • $HIVE_HOME is the location of the Hive JAR files. For example, /usr/lib/hive.

      • $HIVE_USER is the user owning the Hive services. For example, hive.

      • $HADOOP_GROUP is a common group shared by services. For example, hadoop.

    2. Create a comma-spearated list of all the file paths in the uploaded directory ($HIVE_JAR_DIR) on HDFS.

      Continuing with the previous example, create comma-separated list of file paths from the /apps/hive/tez-ampool-jars directory.

      These file paths would be in the form of /apps/hive/tez-ampool/hive*.jar.

  4. On the Tez Service host machine, edit $TEZ_CONF_DIR/tez-ampool-site.xml and modify the following properties:

    (where $TEZ_CONF_DIR is the directory that contains all the Tez configuration files and by default is set to /etc/tez/conf)

    <property>    
        <name>tez.ampool.ws.port</name>    
        <value>12999</value>    
        <description>Port to use for AMPoolService status.</description>  
    </property>

    <property>    
        <name>tez.ampool.am-pool-size</name>    
        <value>3</value>    
        <description>Minimum size of AM Pool.</description>  
    </property> 

    <property>    
        <name>tez.ampool.max.am-pool-size</name>    
        <value>5</value>    
        <description>Maximum size of AM Pool.</description>  
    </property> 

    <property>    
        <name>tez.ampool.launch-new-am-after-app-completion</name>    
        <value>true</value>    
        <description>This property determines the time to launch new AM. 
                     If set to true, new AM is launched after an existing AM in the pool completes execution. Otherwise,
          AM is launched as soon as a job is assigned to an AM from the  pool.</description>  
    </property>

    <property>    
        <name>tez.ampool.max-am-launch-failures</name>    
        <value>10</value>    
        <description>Number of launch failures to account for unassigned AMs before shutting down AMPoolService.</description>  
    </property>

    <property>    
        <name>tez.ampool.address</name>    
        <value>$Tez_Host_Machine:10030</value>    
        <description>Address on which to run the ClientRMProtocol proxy.</description>  
    </property>

    <property>    
        <name>tez.ampool.mr-am.memory-allocation-mb</name>    
        <value>1536</value>    
        <description>Memory to use when launching the lazy MR AM.</description>  
    </property>

    <property>    
        <name>tez.ampool.mr-am.queue-name</name>    
        <value>default</value>    
        <description>Queue to which the Lazy MRAM is to be submitted to.</description>  
    </property>

    The value of the following tez.ampool.mr-am.job-jar-path property will be the file path of the uploaded directory ($HIVE_JAR_DIR) on HDFS (from Step - 4 above) .

    For example,

    <property>    
        <name>tez.ampool.mr-am.job-jar-path</name>    
        <value>
    hadoop dfs -mkdir -p $HIVE_JAR_DIR
    hadoop dfs -put $HIVE_HOME/hive*.jar $HIVE_JAR_DIR
    hadoop dfs -put $HIVE_HOME/hive*.war $HIVE_JAR_DIR
    </value>   
        <description>Location of the Hive JAR files on HDFS.</description>  
    </property> 

    where $HIVE_JAR_DIR is the directory that contains Hive JAR files. For example, /apps/hive/tez-ampool-jars.

    User submitting jobs should have appropriate access permissions to the files listed in tez.ampool.mr-am.job-jar-path property.

    <property>    
        <name>tez.ampool.tmp-dir-path</name>    
        <value>/tmp/ampoolservice/</value>    
        <description>Local filesystem path for staging local data used by AMPoolClient/AMPoolService.</description>  
    </property>
    <property>    
        <name>tez.ampool.am.staging-dir</name>    
        <value>/tmp/tez/ampool/staging/</value>    
        <description>Path on HDFS used by AMPoolService to upload lazy-mr-am config.</description>  
    </property>
    [Important]Important

    The user starting the Tez Service must have appropriate permissions to the tez.ampool.am.staging-dir directory.

  5. On all the client nodes and the Tez Service host machine, edit $TEZ_CONF_DIR/lazy-mram-site.xml and modify the following property:

    (where $TEZ_CONF_DIR is the directory that contains all the Tez configuration files and by default is set to /etc/tez/conf)

    <property>    
        <name>yarn.app.mapreduce.am.lazy.prealloc-container-count</name>    
        <value>1</value>    
        <description>Number of containers to pre-allocate after starting up. To use preallocation, the value for this property must be set to a non-zero value.</description>  
    </property>

    [Important]Important

    The tez.ampool.am-pool-size, tez.ampool.max-am-pool-size, and yarn.app.mapreduce.am.lazy.prealloc-container-count parameters affect the cluster resources utilized by the Tez Service.

    The tez.ampool.am-pool-size parameter determines the minimum number of YARN containers utilized and is equal to the number of Tez AMs launched. Each Tez AM, in turn, will allocate at the most N containers where N is defined by yarn.app.mapreduce.am.lazy.prealloc-container-count.

    The above two together define the resource utilization and therefore should be set carefully to ensure that the Tez Service does not occupy all the resources in your cluster.


loading table of contents...