Step 2: Verify User Accounts and Groups in CDH 5 Due to Security
CDH 5 introduces a new version of MapReduce: MapReduce 2.0 (MRv2) built on the YARN framework. In this document, we refer to this new version as YARN. CDH 5 also provides an implementation of the previous version of MapReduce, referred to as MRv1 in this document.
- If you are using MRv1, see Step 2a (MRv1 only): Verify User Accounts and Groups in MRv1 for configuration information. Or
- If you are using YARN, see Step 2b (YARN only): Verify User Accounts and Groups in YARN for configuration information.
Step 2a (MRv1 only): Verify User Accounts and Groups in MRv1
If you are using YARN, skip this step and proceed to Step 2b (YARN only): Verify User Accounts and Groups in YARN.
During CDH 5 package installation of MRv1, the following Unix user accounts are automatically created to support security:
This User |
Runs These Hadoop Programs |
---|---|
hdfs |
HDFS: NameNode, DataNodes, Secondary NameNode (or Standby NameNode if you are using HA) |
mapred |
MRv1: JobTracker and TaskTrackers |
The hdfs user also acts as the HDFS superuser.
The hadoop user no longer exists in CDH 5. If you currently use the hadoop user to run applications as an HDFS super-user, you should instead use the new hdfs user, or create a separate Unix account for your application such as myhadoopapp.
MRv1: Directory Ownership in the Local File System
Because the HDFS and MapReduce services run as different users, you must be sure to configure the correct directory ownership of the following files on the local file system of each host:
File System |
Directory |
Owner |
Permissions |
---|---|---|---|
Local |
dfs.namenode.name.dir (dfs.name.dir is deprecated but will also work) |
hdfs:hdfs |
drwx------ |
Local |
dfs.datanode.data.dir (dfs.data.dir is deprecated but will also work) |
hdfs:hdfs |
drwx------ |
Local |
mapred.local.dir |
mapred:mapred |
drwxr-xr-x |
See also Deploying MapReduce v1 (MRv1) on a Cluster.
You must also configure the following permissions for the HDFS and MapReduce log directories (the default locations in /var/log/hadoop-hdfs and /var/log/hadoop-0.20-mapreduce), and the $MAPRED_LOG_DIR/userlogs/ directory:
File System |
Directory |
Owner |
Permissions |
---|---|---|---|
Local |
HDFS_LOG_DIR |
hdfs:hdfs |
drwxrwxr-x |
Local |
MAPRED_LOG_DIR |
mapred:mapred |
drwxrwxr-x |
Local |
userlogs directory in MAPRED_LOG_DIR |
mapred:anygroup |
permissions will be set automatically at daemon start time |
MRv1: Directory Ownership on HDFS
The following directories on HDFS must also be configured as follows:
File System |
Directory |
Owner |
Permissions |
---|---|---|---|
HDFS |
mapreduce.jobtracker.system.dir (mapred.system.dir is deprecated but will also work) |
mapred:hadoop |
drwx------ |
HDFS |
/ (root directory) |
hdfs:hadoop |
drwxr-xr-x |
MRv1: Changing the Directory Ownership on HDFS
- If Hadoop security is enabled, use kinit hdfs to obtain Kerberos credentials for the hdfs user by running the following commands before changing the directory ownership on HDFS:
$ sudo -u hdfs kinit -k -t hdfs.keytab hdfs/fully.qualified.domain.name@YOUR-REALM.COM
If kinit hdfs does not work initially, run kinit -R after running kinit to obtain credentials. (For more information, see Problem 2 in Appendix A - Troubleshooting). To change the directory ownership on HDFS, run the following commands. Replace the example /mapred/system directory in the commands below with the HDFS directory specified by the mapreduce.jobtracker.system.dir (or mapred.system.dir) property in the conf/mapred-site.xml file:
$ sudo -u hdfs hadoop fs -chown mapred:hadoop /mapred/system $ sudo -u hdfs hadoop fs -chown hdfs:hadoop / $ sudo -u hdfs hadoop fs -chmod -R 700 /mapred/system $ sudo -u hdfs hadoop fs -chmod 755 /
- In addition (whether or not Hadoop security is enabled) create the /tmp directory. For instructions on creating /tmp and setting its permissions, see these instructions.
Step 2b (YARN only): Verify User Accounts and Groups in YARN
If you are using MRv1, skip this step and proceed to Step 3: If you are Using AES-256 Encryption, install the JCE Policy File.
During CDH 5 package installation of MapReduce 2.0 (YARN), the following Unix user accounts are automatically created to support security:
This User |
Runs These Hadoop Programs |
---|---|
hdfs |
HDFS: NameNode, DataNodes, Standby NameNode (if you are using HA) |
yarn |
YARN: ResourceManager, NodeManager |
mapred |
YARN: MapReduce Job History Server |
The HDFS and YARN daemons must run as different Unix users; for example, hdfs and yarn. The MapReduce Job History server must run as user mapred. Having all of these users share a common Unix group is recommended; for example, hadoop.
YARN: Directory Ownership in the Local File System
Because the HDFS and MapReduce services run as different users, you must be sure to configure the correct directory ownership of the following files on the local file system of each host:
File System |
Directory |
Owner |
Permissions (see Footnote 1) |
---|---|---|---|
Local |
dfs.namenode.name.dir (dfs.name.dir is deprecated but will also work) |
hdfs:hdfs |
drwx------ |
Local |
dfs.datanode.data.dir (dfs.data.dir is deprecated but will also work) |
hdfs:hdfs |
drwx------ |
Local |
yarn.nodemanager.local-dirs |
yarn:yarn |
drwxr-xr-x |
Local |
yarn.nodemanager.log-dirs |
yarn:yarn |
drwxr-xr-x |
Local |
container-executor |
root:yarn |
--Sr-s--- |
Local |
conf/container-executor.cfg |
root:yarn |
r-------- |
You must also configure the following permissions for the HDFS, YARN and MapReduce log directories (the default locations in /var/log/hadoop-hdfs, /var/log/hadoop-yarn and /var/log/hadoop-mapreduce):
File System |
Directory |
Owner |
Permissions |
---|---|---|---|
Local |
HDFS_LOG_DIR |
hdfs:hdfs |
drwxrwxr-x |
Local |
$YARN_LOG_DIR |
yarn:yarn |
drwxrwxr-x |
Local |
MAPRED_LOG_DIR |
mapred:mapred |
drwxrwxr-x |
YARN: Directory Ownership on HDFS
The following directories on HDFS must also be configured as follows:
File System |
Directory |
Owner |
Permissions |
---|---|---|---|
HDFS |
/ (root directory) |
hdfs:hadoop |
drwxr-xr-x |
HDFS |
yarn.nodemanager.remote-app-log-dir |
yarn:hadoop |
drwxrwxrwxt |
HDFS |
mapreduce.jobhistory.intermediate-done-dir |
mapred:hadoop |
drwxrwxrwxt |
HDFS |
mapreduce.jobhistory.done-dir |
mapred:hadoop |
drwxr-x--- |
YARN: Changing the Directory Ownership on HDFS
- If Hadoop security is enabled, use kinit hdfs to obtain Kerberos credentials for the hdfs user by running the following commands:
$ sudo -u hdfs kinit -k -t hdfs.keytab hdfs/fully.qualified.domain.name@YOUR-REALM.COM $ hadoop fs -chown hdfs:hadoop / $ hadoop fs -chmod 755 /
If kinit hdfs does not work initially, run kinit -R after running kinit to obtain credentials. (See Problem 2 in Appendix A - Troubleshooting. To change the directory ownership on HDFS, run the following commands:
$ sudo -u hdfs hadoop fs -chown hdfs:hadoop / $ sudo -u hdfs hadoop fs -chmod 755 / $ sudo -u hdfs hadoop fs -chown yarn:hadoop [yarn.nodemanager.remote-app-log-dir] $ sudo -u hdfs hadoop fs -chmod 1777 [yarn.nodemanager.remote-app-log-dir] $ sudo -u hdfs hadoop fs -chown mapred:hadoop [mapreduce.jobhistory.intermediate-done-dir] $ sudo -u hdfs hadoop fs -chmod 1777 [mapreduce.jobhistory.intermediate-done-dir] $ sudo -u hdfs hadoop fs -chown mapred:hadoop [mapreduce.jobhistory.done-dir] $ sudo -u hdfs hadoop fs -chmod 750 [mapreduce.jobhistory.done-dir]
- In addition (whether or not Hadoop security is enabled) create the /tmp directory. For instructions on creating /tmp and setting its permissions, see these instructions
- In addition (whether or not Hadoop security is enabled), change permissions on the /user/history Directory. See these instructions here.
<< Step 1: Install CDH 5 | Step 3: If you are Using AES-256 Encryption, install the JCE Policy File >> | |