Appendix C - Hadoop Users in Cloudera Manager
A number of special users are created by default when installing and using CDH & Cloudera Manager. Given below is a list of users and groups as of the latest Cloudera Manager 5.1.x release. Also listed are the corresponding Kerberos principals and keytab files that should be created when you configure Kerberos security on your cluster.
Project |
Unix User ID |
Group |
Group Members |
Notes |
---|---|---|---|---|
Cloudera Manager | cloudera-scm | cloudera-scm |
Cloudera Manager processes such as the CM Server and the monitoring daemons run as this user. It is not configurable. The Cloudera Manager keytab file must be named cmf.keytab since that name has been
hard-coded in Cloudera Manager.
Note
|
|
Apache Avro |
No special users. |
|||
Apache Flume | flume | flume |
The sink that writes to HDFS as this user must have write privileges. |
|
Apache HBase | hbase | hbase |
The Master and the RegionServer processes run as this user. |
|
HDFS | hdfs | hdfs | impala |
The NameNode and DataNodes run as this user, and the HDFS root directory as well as the directories used for edit logs should be owned by it. The hdfs user is also part of the hadoop group. |
Apache Hive | hive | hive | impala |
The HiveServer2 process and the Hive Metastore processes run as this user. A user must be defined for Hive access to its Metastore DB (e.g. MySQL or Postgres) but it can be any identifier and does not correspond to a Unix uid. This is javax.jdo.option.ConnectionUserName in hive-site.xml. |
Apache HCatalog | hive | hive |
The WebHCat service (for REST access to Hive functionality) runs as the hive user. It is not configurable. |
|
HttpFS | httpfs | httpfs |
The HttpFS service runs as this user. *See HttpFS Security Configuration for instructions on how to generate the merged httpfs-http.keytab file. |
|
Hue | hue | hue |
Hue runs as this user. It is not configurable. |
|
Cloudera Impala | impala | impala |
An interactive query tool. The impala user also belongs to the hive and hdfs groups. |
|
Llama | llama | llama |
Llama runs as this user. |
|
Apache Mahout |
No special users. |
|||
MapReduce | mapred | mapred |
Without Kerberos, the JobTracker and tasks run as this user. The LinuxTaskController binary is owned by this user for Kerberos. It would be complicated to use a different user ID. |
|
Apache Oozie | oozie | oozie |
The Oozie service runs as this user. |
|
Parquet |
No special users. |
|||
Apache Pig |
No special users. |
|||
Cloudera Search | solr | solr |
The Solr process runs as this user. It is not configurable. |
|
Apache Spark | spark | spark |
The Spark process runs as this user. It is not configurable. |
|
Apache Sentry (incubating) | sentry | sentry |
The Sentry service runs as this user. |
|
Apache Sqoop | sqoop | sqoop |
This user is only for the Sqoop1 Metastore, a configuration option that is not recommended. |
|
Apache Sqoop2 | sqoop2 | sqoop |
The Sqoop2 service runs as this user. |
|
Apache Whirr |
No special users. |
|||
YARN | yarn | yarn |
Without Kerberos, all YARN services and applications run as this user. The LinuxContainerExecutor binary is owned by this user for Kerberos. It would be complicated to use a different user ID. The yarn user also belongs to the hadoop group. |
|
Apache ZooKeeper | zookeeper | zookeeper |
The ZooKeeper process runs as this user. It is not configurable. |
|
Other | hadoop | yarn, hdfs, mapred |
This is a group with no associated Unix user ID or keytab. |
The Kerberos principal names should be of the format, username/fully.qualified.domain.name@YOUR-REALM.COM, where the term username refers to the username of an existing UNIX account, such as hdfs or mapred. The table below lists the usernames to be used for the Kerberos principal names. For example, the Kerberos principal for Apache Flume would be flume/fully.qualified.domain.name@YOUR-REALM.COM.
Project (UNIX ID) | Service | Kerberos Principal Primary | Filename (.keytab) | Keytab File Owner | Keytab File Group | File Permission (octal) |
---|---|---|---|---|---|---|
Cloudera Manager (cloudera-scm) | NA | cloudera-scm | cmf | cloudera-scm | cloudera-scm | 600 |
Cloudera Management Service (cloudera-scm) | cloudera-mgmt- REPORTSMANAGER | cloudera-scm | hdfs | cloudera-scm | cloudera-scm | 600 |
cloudera-mgmt- ACTIVITYMONITOR | ||||||
cloudera-mgmt- SERVICEMONITOR | ||||||
cloudera-mgmt- HOSTMONITOR | ||||||
Flume (flume) | flume-AGENT | flume | flume | cloudera-scm | cloudera-scm | 600 |
HBase (hbase) | hbase-REGIONSERVER | hbase | hbase | cloudera-scm | cloudera-scm | 600 |
hbase- HBASETHRIFTSERVER | ||||||
hbase- HBASERESTSERVER | ||||||
hbase-MASTER | ||||||
HDFS (hdfs) | hdfs-NAMENODE | hdfs | hdfs Secondary: Merge hdfs and HTTP |
cloudera-scm | cloudera-scm | 600 |
hdfs-DATANODE | ||||||
hdfs- SECONDARYNAMENODE | ||||||
Hive (hive) | hive-HIVESERVER2 | hive | hive | cloudera-scm | cloudera-scm | 600 |
hive-WEBHCAT | HTTP | HTTP | ||||
hive-HIVEMETASTORE | hive | hive | ||||
HttpFS (httpfs) | hdfs-HTTPFS | httpfs | httpfs | cloudera-scm | cloudera-scm | 600 |
Hue (hue) | hue-KT_RENEWER | hue | hue | cloudera-scm | cloudera-scm | 600 |
Impala (impala) | impala-STATESTORE | impala | impala | cloudera-scm | cloudera-scm | 600 |
impala-CATALOGSERVER | ||||||
impala-IMPALAD | ||||||
Llama (llama) | impala-LLAMA | llama | llama Secondary: Merge llama and HTTP |
cloudera-scm | cloudera-scm | 600 |
MapReduce (mapred) | mapreduce-JOBTRACKER | mapred | mapred Secondary: Merge mapred and HTTP |
cloudera-scm | cloudera-scm | 600 |
mapreduce- TASKTRACKER | ||||||
Oozie (oozie) | oozie-OOZIE_SERVER | oozie | oozie Secondary: Merge oozie and HTTP |
cloudera-scm | cloudera-scm | 600 |
Search (solr) | solr-SOLR_SERVER | solr | solr Secondary: Merge solr and HTTP |
cloudera-scm | cloudera-scm | 600 |
Sentry (sentry) | sentry-SENTRY_SERVER | sentry | sentry | cloudera-scm | cloudera-scm | 600 |
Spark (spark) | spark_on_yarn- SPARK_YARN_HISTORY_SERVER | spark | spark | cloudera-scm | cloudera-scm | 600 |
Sqoop (sqoop) | ||||||
Sqoop2 (sqoop2) | ||||||
YARN (yarn) | yarn-NODEMANAGER | yarn | yarn Secondary: Merge yarn and HTTP |
cloudera-scm | cloudera-scm | 644 |
yarn- RESOURCEMANAGER | 600 | |||||
yarn-JOBHISTORY | 600 | |||||
ZooKeeper (zookeeper) | zookeeper-server | zookeeper | zookeeper | cloudera-scm | cloudera-scm | 600 |
<< Appendix B - Set up a Cluster-dedicated MIT KDC and Default Domain for the Hadoop Cluster | Configuring Encryption in Cloudera Manager >> | |