Configuring YARN Security

If you are using MRv1, skip this section and see Configuring MRv1 Security.

If you are using YARN, do the following steps to configure, start, and test secure YARN.

  1. Configure Secure YARN.
  2. Start up the ResourceManager.
  3. Start up the NodeManager.
  4. Start up the MapReduce Job History Server.
  5. Try Running a Map/Reduce YARN Job.

Step 1: Configure Secure YARN

Before you start:

  • The Kerberos principals for the ResourceManager and NodeManager are configured in the yarn-site.xml file. The same yarn-site.xml file must be installed on every host machine in the cluster.
  • Make sure that each user who will be running YARN jobs exists on all cluster nodes (that is, on every node that hosts any YARN daemon).

To configure secure YARN:

  1. Add the following properties to the yarn-site.xml file on every machine in the cluster:
    <!-- ResourceManager security configs -->
    <property>
      <name>yarn.resourcemanager.keytab</name>
      <value>/etc/hadoop/conf/yarn.keytab</value>
    <!-- path to the YARN keytab -->
    </property>
    <property>
      <name>yarn.resourcemanager.principal</name>
    
      <value>yarn/_HOST@YOUR-REALM.COM</value>
    </property>
    
    <!-- NodeManager security configs -->
    <property>
      <name>yarn.nodemanager.keytab</name>
      <value>/etc/hadoop/conf/yarn.keytab</value>
    <!-- path to the YARN keytab -->
    </property>
    <property>
      <name>yarn.nodemanager.principal</name>
    
      <value>yarn/_HOST@YOUR-REALM.COM</value>
    </property>
    
    <property>
      <name>yarn.nodemanager.container-executor.class</name>
    
      <value>org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor</value>
    </property>
    
    <property>
      <name>yarn.nodemanager.linux-container-executor.group</name>
      <value>yarn</value>
    </property>
    
    <!-- To enable SSL -->
    <property>
      <name>yarn.http.policy</name>
      <value>HTTPS_ONLY</value>
    </property>
  2. Add the following properties to the mapred-site.xml file on every machine in the cluster:
    <!-- MapReduce Job History Server security configs -->
    <property>
      <name>mapreduce.jobhistory.address</name>
      <value>host:port</value> <!-- Host and port of the MapReduce Job History Server; default port is 10020  -->
    </property>
    <property>
      <name>mapreduce.jobhistory.keytab</name>
      <value>/etc/hadoop/conf/mapred.keytab</value>
    <!-- path to the MAPRED keytab for the Job History Server -->
    </property>
    
    <property>
      <name>mapreduce.jobhistory.principal</name>
    
      <value>mapred/_HOST@YOUR-REALM.COM</value>
    </property>
    
    <!-- To enable SSL -->
    
    <property>
      <name>mapreduce.jobhistory.http.policy</name>
      <value>HTTPS_ONLY</value>
    </property>
  3. Create a file called container-executor.cfg for the Linux Container Executor program that contains the following information:
    yarn.nodemanager.local-dirs=<comma-separated list of paths to local NodeManager directories. Should be same values specified in yarn-site.xml. Required to validate paths passed to container-executor in order.>
    yarn.nodemanager.linux-container-executor.group=yarn
    yarn.nodemanager.log-dirs=<comma-separated list of paths to local NodeManager log directories. Should be same values specified in yarn-site.xml. Required to set proper permissions on the log files so that they can be written to by the user's containers and read by the NodeManager for log aggregation.
    banned.users=hdfs,yarn,mapred,bin
    
    min.user.id=1000
  4. The path to the container-executor.cfg file is determined relative to the location of the container-executor binary. Specifically, the path is <dirname of container-executor binary>/../etc/hadoop/container-executor.cfg. If you installed the CDH 5 package, this path will always correspond to /etc/hadoop/conf/container-executor.cfg.
  5. Verify that the ownership and permissions of the container-executor program corresponds to:
    ---Sr-s--- 1 root yarn 36264 May 20 15:30 container-executor

Step 2: Start up the ResourceManager

You are now ready to start the ResourceManager.

If you're using the /etc/init.d/hadoop-yarn-resourcemanager script, then you can use the service command to run it now:

$ sudo service hadoop-yarn-resourcemanager start

You can verify that the ResourceManager is working properly by opening a web browser to http://host:8088/ where host is the name of the machine where the ResourceManager is running.

Step 3: Start up the NodeManager

You are now ready to start the NodeManager.

If you're using the /etc/init.d/hadoop-yarn-nodemanager script, then you can use the service command to run it now:

$ sudo service hadoop-yarn-nodemanager start

You can verify that the NodeManager is working properly by opening a web browser to http://host:8042/ where host is the name of the machine where the NodeManager is running.

Step 4: Start up the MapReduce Job History Server

You are now ready to start the MapReduce Job History Server.

If you're using the /etc/init.d/hadoop-mapreduce-historyserver script, then you can use the service command to run it now:

$ sudo service hadoop-mapreduce-historyserver start

You can verify that the MapReduce JobHistory Server is working properly by opening a web browser to http://host:19888/ where host is the name of the machine where the MapReduce JobHistory Server is running.

Step 5: Try Running a Map/Reduce YARN Job

You should now be able to run Map/Reduce jobs. To confirm, try launching a sleep or a pi job from the provided Hadoop examples (/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar). Note that you will need Kerberos credentials to do so.

To try running a MapReduce job using YARN, set the HADOOP_MAPRED_HOME environment variable and then submit the job. For example:

$ export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce
$ /usr/bin/hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 10 10000