3. Set Directories and Permissions

Create directories and configure ownership and permissions on the appropriate hosts as described below. If any of these directories already exist, it is recommended that you delete them and recreate them.

To set up Pig configuration files:

  1. Hortonworks recommends that you edit and source the bash script files included in the companion files. Alternately, you can copy the contents to your ~/.bash_profile to set up these environment variables in your environment.

  2. Execute the following commands on your WebHCat server machine to create log and PID directories.

    mkdir -p $WEBHCAT_LOG_DIR
    hmod -R 755 $WEBHCAT_LOG_DIR
    mkdir -p $WEBHCAT_PID_DIR
    chmod -R 755 $WEBHCAT_PID_DIR


    • $WEBHCAT_LOG_DIR is the directory to store the WebHCat logs. For example, var/log/webhcat.

    • $WEBHCAT_PID_DIR is the directory to store the WebHCat process ID. For example, /var/run/webhcat.

    • $WEBHCAT_USER is the user owning the WebHCat services. For example, hcat.

    • $HADOOP_GROUP is a common group shared by services. For example, hadoop.

  3. Set permissions for the WebHCat server to impersonate users on the Hadoop cluster:

    1. Create a Unix user to run the WebHCat server.

    2. Modify the Hadoop core-site.xml file and set the following properties:


      Table 11.1. Hadoop core-site.xml File Properties




      A comma-separated list of the Unix groups whose users will be impersonated.


      A comma-separated list of the hosts that will run the HCatalog and JobTracker servers.

  4. If you are running WebHCat on a secure cluster, create a Kerberos principal for the WebHCat server with the name USER/host@realm, and set the WebHCat configuration variables templeton.kerberos.principal and templeton.kerberos.keytab.

loading table of contents...