Step 1: Install Cloudera Manager and CDH
If you have not already done so, Cloudera strongly recommends that you install and configure the Cloudera Manager Server and Cloudera Manager Agents and CDH to set up a fully-functional CDH cluster before you begin doing the following steps to implement Hadoop security features.
Overview of the User Accounts and Groups in CDH and Cloudera Manager to Support Security
User Accounts and Groups in CDH and Cloudera Manager Required to Support Security:
This User | Runs These Roles |
---|---|
hdfs | NameNode, DataNodes, and Secondary Node |
mapred | JobTracker and TaskTrackers (MR1) and Job History Server (YARN) |
yarn | ResourceManager and NodeManagers (YARN) |
oozie | Oozie Server |
hue | Hue Server, Beeswax Server, Authorization Manager, and Job Designer |
When you install the Cloudera Manager Server on the server host, a new Unix user account called cloudera-scm is created automatically to support security. The Cloudera Manager Server uses this account to create host principals and deploy the keytabs on your cluster.
Depending on whether you installed CDH and Cloudera Manager at the same time or not, use one of the following sections for information on configuring directory ownerships on cluster hosts:
If you installed CDH and Cloudera Manager at the Same Time
If you have a new installation and you installed CDH and Cloudera Manager at the same time, when you started the Cloudera Manager Agents on your cluster hosts, the Cloudera Manager Agent on each host automatically configured the directory owners shown in the following table to support security. Assuming the owners are configured as shown, the Hadoop daemons can then automatically set the permissions for each of the directories specified by the properties shown below to make sure they are properly restricted. It's critical that the owners are configured exactly as shown below, so do not change them:
Directory Specified in this Property | Owner |
---|---|
dfs.name.dir | hdfs:hadoop |
dfs.data.dir | hdfs:hadoop |
mapred.local.dir | mapred:hadoop |
mapred.system.dir in HDFS | mapred:hadoop |
yarn.nodemanager.local-dirs | yarn:yarn |
yarn.nodemanager.log-dirs | yarn:yarn |
oozie.service.StoreService.jdbc.url (if using Derby) | oozie:oozie |
[[database]] name | hue:hue |
javax.jdo.option.ConnectionURL | hue:hue |
If you Installed and Used CDH Before Installing Cloudera Manager
If you have been using HDFS and running MapReduce jobs in an existing installation of CDH before you installed Cloudera Manager, you must manually configure the owners of the directories shown in the table above. Doing so enables the Hadoop daemons to automatically set the permissions for each of the directories. It's critical that you manually configure the owners exactly as shown above.