Create directories and configure ownership and permissions on the appropriate hosts as described below. If any of these directories already exist, it is recommended that you delete them and recreate them.
To set up Pig configuration files:
We strongly suggest that you edit and source the bash script files included in the companion files. Alternately, you can copy the contents to your ~/.bash_profile to set up these environment variables in your environment.
Execute the following commands on your WebHCat server machine to create log and PID directories.
mkdir -p $WEBHCAT_LOG_DIR chown -R $WEBHCAT_USER:$HADOOP_GROUP $WEBHCAT_LOG_DIR hmod -R 755 $WEBHCAT_LOG_DIR
mkdir -p $WEBHCAT_PID_DIR chown -R $WEBHCAT_USER:$HADOOP_GROUP $WEBHCAT_PID_DIR chmod -R 755 $WEBHCAT_PID_DIR
where:
$WEBHCAT_LOG_DIR is the directory to store the WebHCat logs. For example, var/log/webhcat.
$WEBHCAT_PID_DIR is the directory to store the WebHCat process ID. For example, /var/run/webhcat.
$WEBHCAT_USER is the user owning the WebHCat services. For example, hcat.
$HADOOP_GROUP is a common group shared by services. For example, hadoop.
Set permissions for the WebHCat server to impersonate users on the Hadoop cluster:
Create a Unix user to run the WebHCat server.
Modify the Hadoop core-site.xml file and set the following properties:
Table 11.1. Hadoop core-site.xml File Properties
Variable
Value
hadoop.proxyuser.USER.groups
A comma-separated list of the Unix groups whose users will be impersonated.
hadoop.proxyuser.USER.hosts
A comma-separated list of the hosts that will run the HCatalog and JobTracker servers.
If you are running WebHCat on a secure cluster, create a Kerberos principal for the WebHCat server with the name USER/host@realm, and set the WebHCat configuration variables templeton.kerberos.principal and templeton.kerberos.keytab.