2.2.1. Configure secure Hadoop

  1. Edit the $HADOOP_CONF_DIR/core-site.xml file on every host in your cluster, to add the following information:

    <property>   
            <name>hadoop.security.authentication</name>   
            <value>kerberos</value>   
            <description>Set the authentication for the cluster. Valid values are: simple or kerberos.   
            </description>  
    </property> 
    <property>   
            <name>hadoop.rpc.protection</name>   
            <value>authentication</value>   
            <description>This is an [OPTIONAL] setting. If not set, defaults to authentication.authentication= authentication only; the client and server mutually authenticate during connection setup.integrity = authentication and integrity; guarantees the integrity of data exchanged between client and server aswell as authentication.privacy = authentication, integrity, and confidentiality; guarantees that data exchanged between client andserver is encrypted and is not readable by a “man in the middle”.
            </description>  
    </property> 
    <property>  
            <name>hadoop.security.authorization</name>  
            <value>true</value>  
            <description>Enable authorization for different protocols.  
            </description> 
    </property>  
    <property>
        
            <name>hadoop.security.auth_to_local</name>    
            <value>RULE:[2:$1@$0]([jt]t@.*EXAMPLE.COM)s/.*/$MAPRED_USER/
    RULE:[2:$1@$0]([nd]n@.*EXAMPLE.COM)s/.*/$HDFS_USER/
    DEFAULT</value> 
            <description>The mapping from Kerberos principal names to local OS user names. </description>
    </property>

    For mapping from Kerberos principal names to local OS user names, see Create Mappings Between Principals and UNIX Usernames.

    <property>
      <name>hadoop.proxyuser.hive.groups</name>
      <value>users</value>
      <description>Allow the superuser hive to impersonate any members of the group users. Required only when installing Hive.
      </description>
    </property>

    where $HIVE_USER is the user owning Hive Services. For example, hive.

    <property>
      <name>hadoop.proxyuser.hive.hosts</name>
      <value>$Hive_Hostname_FQDN</value>
      <description>Hostname from where superuser hive can connect. Required only when installing Hive.
      </description>
    </property>
    <property>
      <name>hadoop.proxyuser.oozie.groups</name>
     <value>users</value>
      <description>Allow the superuser oozie to impersonate any members of the group users. Required only when installing Oozie.
     </description>
    </property>
    
    <property>
      <name>hadoop.proxyuser.oozie.hosts</name>
      <value>$Oozie_Hostname_FQDN</value>
     <description>Hostname from where superuser oozie can connect. Required only when installing Oozie.
      </description>
    </property>
    <property>
     <name>hadoop.proxyuser.hcat.hosts</name>
      <value>$WebHCat_Hostname_FQDN</value>
      <description>Hostname from where superuser hcat can connect. Required only when installing WebHCat.
      </description>
    </property>
    <property>
      <name>hadoop.proxyuser.HTTP.groups</name>
     <value>users</value>
     <description>Allow the superuser HTTP to impersonate any members of the group users.
      </description>
    </property>
    <property>
      <name>hadoop.proxyuser.HTTP.hosts</name>
      <value>$WebHCat_Hostname_FQDN</value>
     <description>Hostname from where superuser HTTP can connect.
     </description>
    </property>
     <property>
      <name>hadoop.proxyuser.hcat.groups</name>
     <value>users</value>
      <description>Allow the superuser hcat to impersonate any members of the group users. Required only when installing WebHCat.
     </description>
    </property>
    <property>
     <name>hadoop.proxyuser.hcat.hosts</name>
      <value>$WebHCat_Hostname_FQDN</value>
      <description>Hostname from where superuser hcat can connect. This is required only when installing webhcat on the cluster.
      </description>
    </property> 
    

  2. Edit the $HADOOP_CONF_DIR/hdfs-site.xml file on every host in your cluster, to add the following information:

    <property> 
            <name>dfs.block.access.token.enable</name> 
            <value>true</value> 
            <description> If "true", access tokens are used as capabilities
            for accessing datanodes. If "false", no access tokens are checked on
            accessing datanodes. </description> 
    </property> 

    <property> 
            <name>dfs.namenode.kerberos.principal</name> 
            <value>nn/_HOST@EXAMPLE.COM</value> 
            <description> Kerberos principal name for the
            NameNode </description> 
    </property>   

    <property> 
            <name>dfs.secondary.namenode.kerberos.principal</name> 
            <value>nn/_HOST@EXAMPLE.COM</value>    
            <description>Kerberos principal name for the secondary NameNode.    
            </description>          
    </property>  

    <property>     
            <!--cluster variant -->    
            <name>dfs.secondary.http.address</name>    
            <value>$Secondary.NameNode.FQDN</value>    
            <description>Address of secondary namenode web server</description>  
    </property>    

    <property>    
            <name>dfs.secondary.https.port</name>    
            <value>50490</value>    
            <description>The https port where secondary-namenode
            binds</description>  
    </property>    

    <property>    
            <name>dfs.web.authentication.kerberos.principal</name>    
            <value>HTTP/_HOST@EXAMPLE.COM</value>    
            <description> The HTTP Kerberos principal used by Hadoop-Auth in the HTTP endpoint. 
    The HTTP Kerberos principal MUST start with 'HTTP/' per Kerberos HTTP SPNEGO specification.    
            </description>  
    </property>    

    <property>    
            <name>dfs.web.authentication.kerberos.keytab</name>    
            <value>/etc/security/keytabs/spnego.service.keytab</value>    
            <description>The Kerberos keytab file with the credentials for the HTTP Kerberos principal used by Hadoop-Auth in the HTTP endpoint.    
            </description>  
    </property>    
    

    <property>    
            <name>dfs.datanode.kerberos.principal</name>    
            <value>dn/_HOST@EXAMPLE.COM</value>  
            <description>The Kerberos principal that the DataNode runs as. "_HOST" is replaced by the real host name.    
            </description>  
    </property>    
    

    <property>    
            <name>dfs.namenode.keytab.file</name>    
            <value>/etc/security/keytabs/nn.service.keytab</value>  
            <description>Combined keytab file containing the NameNode service and host principals.    
            </description>  
    </property>    
    

    <property>     
            <name>dfs.secondary.namenode.keytab.file</name>    
            <value>/etc/security/keytabs/nn.service.keytab</value>  
            <description>Combined keytab file containing the NameNode service and host principals.    
            </description>  
    </property>    
    

    <property>     
            <name>dfs.datanode.keytab.file</name>    
            <value>/etc/security/keytabs/dn.service.keytab</value>  
            <description>The filename of the keytab file for the DataNode.    
            </description>  
    </property>    
    

    <property>    
            <name>dfs.https.port</name>    
            <value>50470</value>  
            <description>The https port where NameNode binds</description>    
    </property>    
    

    <property>    
            <name>dfs.https.address</name>    
            <value>$HTTPS_Address_for_NameNode</value>  
            <description>The https address where namenode binds. Example: ip-10-111-59-170.ec2.internal:50470</description>    
    </property>    
    

    <property>  
            <name>dfs.namenode.kerberos.internal.spnego.principal</name>  
            <value>$dfs.web.authentication.kerberos.principal</value> 
    </property>   
    

    <property>  
            <name>dfs.secondary.namenode.kerberos.internal.spnego.principal</name>  
            <value>$dfs.web.authentication.kerberos.principal</value> 
    </property>
    

    <property>
            <name>dfs.datanode.address</name>
            <value></value>
            <description>The address, with a privileged port - any port number under 1023. Example: 0.0.0.0:1019</description>
    </property>
    

    <property>
            <name>dfs.datanode.http.address</name>
            <value>The address, with a privileged port - any port number under 1023. Example: 0.0.0.0:1022</value>
    </property> 

    For the DataNodes to run in secure mode, you must set the username which the DataNode process should run as by setting HADOOP_SECURE_DN_USER as shown below :

    export HADOOP_SECURE_DN_USER=$HDFS_USER

    where $HDFS_USER is the user owning HDFS services. For example, hdfs.

    [Note]Note

    The DataNode daemon must be started as root.

    Optionally, you can allow that user to access the directories where PID and log files are stored. For example:

    export HADOOP_SECURE_DN_PID_DIR=/var/run/hadoop/$HADOOP_SECURE_DN_USER
    export HADOOP_SECURE_DN_LOG_DIR=/var/run/hadoop/$HADOOP_SECURE_DN_USER
  3. Edit the mapred-site.xml file on every host in your cluster to add the following information:

    <property>  
            <name>mapreduce.jobtracker.kerberos.principal</name>  
            <value>jt/_HOST@EXAMPLE.COM</value>  
            <description>Kerberos principal name for the JobTracker   </description> 
    </property>  

    <property>  
            <name>mapreduce.tasktracker.kerberos.principal</name>   
            <value>tt/_HOST@EXAMPLE.COM</value>  
            <description>Kerberos principal name for the TaskTracker."_HOST" is replaced by the host name of the TaskTracker.  
            </description> 
    </property> 

    <property>   
            <name>mapreduce.jobtracker.keytab.file</name>   
            <value>/etc/security/keytabs/jt.service.keytab</value>   
            <description>The keytab for the JobTracker principal.   
            </description>   
    </property>

    <property>   
            <name>mapreduce.tasktracker.keytab.file</name>   
            <value>/etc/security/keytabs/tt.service.keytab</value>    
            <description>The filename of the keytab for the TaskTracker</description>  
    </property>

    <property>    
            <name>mapreduce.jobhistory.kerberos.principal</name>     
            <!--cluster variant -->  
            <value>jt/_HOST@EXAMPLE.COM</value>    
            <description> Kerberos principal name for JobHistory. This must map to the same user as the JobTracker user (mapred).
            </description>  
    </property> 

    <property>   
            <name>mapreduce.jobhistory.keytab.file</name>     
            <!--cluster variant -->   
            <value>/etc/security/keytabs/jt.service.keytab</value>   
            <description>The keytab for the JobHistory principal.
            </description>  
    </property>   

where $HADOOP_CONF_DIR is directory for storing the Hadoop configuration files. For example, /etc/hadoop/conf.


loading table of contents...