Edit the
file on every host in your cluster, to add the following information:$HADOOP_CONF_DIR
/core-site.xml<property> <name>hadoop.security.authentication</name> <value>kerberos</value> <description>Set the authentication for the cluster. Valid values are: simple or kerberos. </description> </property>
<property> <name>hadoop.rpc.protection</name> <value>authentication</value> <description>This is an [OPTIONAL] setting. If not set, defaults to authentication.authentication= authentication only; the client and server mutually authenticate during connection setup.integrity = authentication and integrity; guarantees the integrity of data exchanged between client and server aswell as authentication.privacy = authentication, integrity, and confidentiality; guarantees that data exchanged between client andserver is encrypted and is not readable by a “man in the middle”. </description> </property>
<property> <name>hadoop.security.authorization</name> <value>true</value> <description>Enable authorization for different protocols. </description> </property>
<property> <name>hadoop.security.auth_to_local</name> <value>RULE:[2:$1@$0]([jt]t@.*EXAMPLE.COM)s/.*/$MAPRED_USER/ RULE:[2:$1@$0]([nd]n@.*EXAMPLE.COM)s/.*/$HDFS_USER/ DEFAULT</value> <description>The mapping from Kerberos principal names to local OS user names. </description> </property>
For mapping from Kerberos principal names to local OS user names, see Create Mappings Between Principals and UNIX Usernames.
<property> <name>hadoop.proxyuser.hive.groups</name> <value>users</value> <description>Allow the superuser hive to impersonate any members of the group users. Required only when installing Hive. </description> </property>
where
$HIVE_USER
is the user owning Hive Services. For example,hive
.<property> <name>hadoop.proxyuser.hive.hosts</name> <value>$Hive_Hostname_FQDN</value> <description>Hostname from where superuser hive can connect. Required only when installing Hive. </description> </property>
<property> <name>hadoop.proxyuser.oozie.groups</name> <value>users</value> <description>Allow the superuser oozie to impersonate any members of the group users. Required only when installing Oozie. </description> </property>
<property> <name>hadoop.proxyuser.oozie.hosts</name> <value>$Oozie_Hostname_FQDN</value> <description>Hostname from where superuser oozie can connect. Required only when installing Oozie. </description> </property>
<property> <name>hadoop.proxyuser.hcat.hosts</name> <value>$WebHCat_Hostname_FQDN</value> <description>Hostname from where superuser hcat can connect. Required only when installing WebHCat. </description> </property>
<property> <name>hadoop.proxyuser.HTTP.groups</name> <value>users</value> <description>Allow the superuser HTTP to impersonate any members of the group users. </description> </property>
<property> <name>hadoop.proxyuser.HTTP.hosts</name> <value>$WebHCat_Hostname_FQDN</value> <description>Hostname from where superuser HTTP can connect. </description> </property>
<property> <name>hadoop.proxyuser.hcat.groups</name> <value>users</value> <description>Allow the superuser hcat to impersonate any members of the group users. Required only when installing WebHCat. </description> </property>
<property> <name>hadoop.proxyuser.hcat.hosts</name> <value>$WebHCat_Hostname_FQDN</value> <description>Hostname from where superuser hcat can connect. This is required only when installing webhcat on the cluster. </description> </property>
Edit the
file on every host in your cluster, to add the following information:$HADOOP_CONF_DIR
/hdfs-site.xml<property> <name>dfs.block.access.token.enable</name> <value>true</value> <description> If "true", access tokens are used as capabilities for accessing datanodes. If "false", no access tokens are checked on accessing datanodes. </description> </property>
<property> <name>dfs.namenode.kerberos.principal</name> <value>nn/_HOST@EXAMPLE.COM</value> <description> Kerberos principal name for the NameNode </description> </property>
<property> <name>dfs.secondary.namenode.kerberos.principal</name> <value>nn/_HOST@EXAMPLE.COM</value> <description>Kerberos principal name for the secondary NameNode. </description> </property>
<property> <!--cluster variant --> <name>dfs.secondary.http.address</name> <value>$Secondary.NameNode.FQDN</value> <description>Address of secondary namenode web server</description> </property>
<property> <name>dfs.secondary.https.port</name> <value>50490</value> <description>The https port where secondary-namenode binds</description> </property>
<property> <name>dfs.web.authentication.kerberos.principal</name> <value>HTTP/_HOST@EXAMPLE.COM</value> <description> The HTTP Kerberos principal used by Hadoop-Auth in the HTTP endpoint. The HTTP Kerberos principal MUST start with 'HTTP/' per Kerberos HTTP SPNEGO specification. </description> </property>
<property> <name>dfs.web.authentication.kerberos.keytab</name> <value>/etc/security/keytabs/spnego.service.keytab</value> <description>The Kerberos keytab file with the credentials for the HTTP Kerberos principal used by Hadoop-Auth in the HTTP endpoint. </description> </property>
<property> <name>dfs.datanode.kerberos.principal</name> <value>dn/_HOST@EXAMPLE.COM</value> <description>The Kerberos principal that the DataNode runs as. "_HOST" is replaced by the real host name. </description> </property>
<property> <name>dfs.namenode.keytab.file</name> <value>/etc/security/keytabs/nn.service.keytab</value> <description>Combined keytab file containing the NameNode service and host principals. </description> </property>
<property> <name>dfs.secondary.namenode.keytab.file</name> <value>/etc/security/keytabs/nn.service.keytab</value> <description>Combined keytab file containing the NameNode service and host principals. </description> </property>
<property> <name>dfs.datanode.keytab.file</name> <value>/etc/security/keytabs/dn.service.keytab</value> <description>The filename of the keytab file for the DataNode. </description> </property>
<property> <name>dfs.https.port</name> <value>50470</value> <description>The https port where NameNode binds</description> </property>
<property> <name>dfs.https.address</name> <value>$HTTPS_Address_for_NameNode</value> <description>The https address where namenode binds. Example: ip-10-111-59-170.ec2.internal:50470</description> </property>
<property> <name>dfs.namenode.kerberos.internal.spnego.principal</name> <value>$dfs.web.authentication.kerberos.principal</value> </property>
<property> <name>dfs.secondary.namenode.kerberos.internal.spnego.principal</name> <value>$dfs.web.authentication.kerberos.principal</value> </property>
<property> <name>dfs.datanode.address</name> <value></value> <description>The address, with a privileged port - any port number under 1023. Example: 0.0.0.0:1019</description> </property>
<property> <name>dfs.datanode.http.address</name> <value>The address, with a privileged port - any port number under 1023. Example: 0.0.0.0:1022</value> </property>
For the datanodes to run in secure mode, you must set the user-name which the DataNode process should run as, by setting HADOOP_SECURE_DN_USER as shown below::
export HADOOP_SECURE_DN_USER=$HDFS_USER
where
$HDFS_USER
is the user owning HDFS services. For example,hdfs
.Note The DataNode daemon must be started as
root
.Optionally, you can allow that user to access the directories where PID and log files are stored. For example:
export HADOOP_SECURE_DN_PID_DIR=/var/run/hadoop/$HADOOP_SECURE_DN_USER export HADOOP_SECURE_DN_LOG_DIR=/var/run/hadoop/$HADOOP_SECURE_DN_USER
Edit the
mapred-site.xml
file on every host in your cluster to add the following information:<property> <name>mapreduce.jobtracker.kerberos.principal</name> <value>jt/_HOST@EXAMPLE.COM</value> <description>Kerberos principal name for the JobTracker </description> </property>
<property> <name>mapreduce.tasktracker.kerberos.principal</name> <value>tt/_HOST@EXAMPLE.COM</value> <description>Kerberos principal name for the TaskTracker."_HOST" is replaced by the host name of the TaskTracker. </description> </property>
<property> <name>mapreduce.jobtracker.keytab.file</name> <value>/etc/security/keytabs/jt.service.keytab</value> <description>The keytab for the JobTracker principal. </description> </property>
<property> <name>mapreduce.tasktracker.keytab.file</name> <value>/etc/security/keytabs/tt.service.keytab</value> <description>The filename of the keytab for the TaskTracker</description> </property>
<property> <name>mapreduce.jobhistory.kerberos.principal</name> <!--cluster variant --> <value>jt/_HOST@EXAMPLE.COM</value> <description> Kerberos principal name for JobHistory. This must map to the same user as the JobTracker user (mapred). </description> </property>
<property> <name>mapreduce.jobhistory.keytab.file</name> <!--cluster variant --> <value>/etc/security/keytabs/jt.service.keytab</value> <description>The keytab for the JobHistory principal. </description> </property>
where $HADOOP_CONF_DIR
is directory for storing the Hadoop configuration files.
For example, /etc/hadoop/conf
.