2. Set Up Configuration Files

[Note]Note

If you are using an Ambari-managed cluster, use Ambari to to update core-site.xml, mapred-site.xml and oozie-site.xml according to the instructions below. Do not make changes to the files directly since Ambari will overwrite them.

Use the following instructions to manually set up the configuration files:

  1. On the NameNode, Secondary NameNode, and all DataNodes, modify the configuration files as instructed below:

    1. Modify $HADOOP_CONF_DIR/core-site.xml file:

      <property>
        <name>hadoop.proxyuser.hue.hosts</name>
        <value>*</value>
      </property>
      

      <property>
        <name>hadoop.proxyuser.hue.groups</name>
        <value>*</value>
      </property>

    2. Modify the $HADOOP_CONF_DIR/hdfs-site.xml file.

      <property>
        <name>dfs.support.broken.append</name>
        <value>true</value>
        <final>true</final>
      </property>

    3. Use WebHDFS/HttpFS to access HDFS data:

      • Option I: Configure WebHDFS (recommended)

        Modify the $HADOOP_CONF_DIR/hdfs-site.xml file on the NameNode and all DataNodes:

        <property>
          <name>dfs.webhdfs.enabled</name>
          <value>true</value>
        </property>

      • Option II: Configure HttpFS (remote access)

        If you are using a remote Hue Server, you can run an HttpFS server to provide Hue access to HDFS.

        Add the following properties /etc/hadoop-httpfs/conf/httpfs-site.xml file:

        <property>
          <name>httpfs.proxyuser.hue.hosts</name>
          <value>*</value>
        </property>
        <property>
          <name>httpfs.proxyuser.hue.groups</name>
          <value>*</value>
        </property>

    4. Modify the webhcat-site.xml file.

      On the WebHCat Server host, add the following properties to the $WEBHCAT_CONF_DIR/webhcat-site.xml, where $WEBHCAT_CONF_DIR is the directory for storing WebHCat configuration files. For example, /etc/webhcat/conf .

      vi $WEBHCAT_CONF_DIR/webhcat-site.xml 
         <property>
          <name>webhcat.proxyuser.hue.hosts</name>
          <value>*</value>
         </property>
         <property>
          <name>webhcat.proxyuser.hue.groups</name>
          <value>*</value>
        </property>
    5. [Optional] - If you are setting $HADOOP_CLASSPATH in your $HADOOP_CONF_DIR/hadoop-env.sh file, verify that your settings preserve the user-specified options.

      For example, the following sample illustrates correct settings for $HADOOP_CLASSPATH:

      # HADOOP_CLASSPATH=<your_additions>:$HADOOP_CLASSPATH

      This setting lets certain Hue components add the Hadoop CLASSPATH using the environment variable.

    6. [Optional] - Enable job submission using both Hue and the command line interface (CLI).

      The hadoop.tmp.dir is used to unpack JAR files in /usr/lib/hadoop/lib JAR.

      If you start using both Hue and command line interface for job submission it leads to contention for the hadoop.tmp.dir directory. By default, hadoop.tmp.dir is at /tmp/hadoop-$USER_NAME.

      To enable job submission using both Hue and CLI, update the following property in the $HADOOP_CONF_DIR/core-site.xml file:

      <property>
        <name>hadoop.tmp.dir</name>
        <value>/tmp/hadoop-$USER_NAME$HUE_SUFFIX</value>
      </property>

      where

      • $HADOOP_CONF_DIR is the directory for storing the Hadoop configuration files, for example, /etc/hadoop/conf.

  2. Install Hue-plugins

    1. Verify that all the services are stopped. See the instructions provided here.

    2. Install Hue-plugins. On the JobTracker host machine, execute the following command:

      • For RHEL/CentOS:

        yum install hue-plugins
      • For SLES:

        zypper install hue-plugins

      Verify that Hue-plugins JAR file is available in the Hadoop lib directory (located at usr/lib/hadoop/lib)

    3. Add the following properties to $HADOOP_CONF_DIR/mapred-site.xml on the JobTracker host machine:

      <property>
        <name>jobtracker.thrift.address</name>
        <value>0.0.0.0:9290</value>
      </property>
      <property>
        <name>mapreduce.jobtracker.plugins</name>
        <value>org.apache.hadoop.thriftfs.ThriftJobTrackerPlugin</value>
        <description>Comma-separated list of jobtracker plugins to be activated.</description>
      </property> 

      $HADOOP_CONF_DIR is the directory for storing the Hadoop configuration files, for example, /etc/hadoop/conf.

  3. Configure Oozie.

    On the Oozie server host machine, modify OOZIE_CONF_DIR/oozie-site.xml as shown below:

     <property>
        <name>oozie.service.ProxyUserService.proxyuser.hue.hosts</name>
        <value>*</value>
    </property>
    <property>
        <name>oozie.service.ProxyUserService.proxyuser.hue.groups</name>
        <value>*</value>
    </property> 

    where OOZIE_CONF_DIR is the directory to store the Oozie configuration files. For example, /etc/oozie/conf.

  4. Restart all the services in your cluster. For more information use the instructions provided here.


loading table of contents...