This is the documentation for CDH 5.1.x. Documentation for other versions is available at Cloudera Documentation.

Hadoop Users in CDH 5

A number of special users are created by default when installing and using CDH & Cloudera Manager. Given below is a list of users and groups as of the latest CDH 5.1.x release. Also listed below are the corresponding Kerberos principals and keytab files that should be created when you configure Kerberos security on your cluster.

Table 1. CDH 5 Users & Groups
Project	Unix User ID	Primary Group	Group Members	Notes
Apache Avro				No special users.
Apache Flume	`flume`	`flume`		The sink that writes to HDFS as this user must have write privileges.
Apache HBase	`hbase`	`hbase`		The Master and the RegionServer processes run as this user.
HDFS	`hdfs`	`hdfs`	`impala`	The NameNode and DataNodes run as this user, and the HDFS root directory as well as the directories used for edit logs should be owned by it.
Apache Hive	`hive`	`hive`	`impala`	The HiveServer2 process and the Hive Metastore processes run as this user. A user must be defined for Hive access to its Metastore DB (e.g. MySQL or Postgres) but it can be any identifier and does not correspond to a Unix uid. This is `javax.jdo.option.ConnectionUserName` in hive-site.xml.
Apache HCatalog	`hive`	`hive`		The WebHCat service (for REST access to Hive functionality) runs as the `hive` user. It is not configurable.
HttpFS	`httpfs`	`httpfs`		The HttpFS service runs as this user. *See HttpFS Security Configuration for instructions on how to generate the merged `httpfs-http.keytab` file.
Hue	`hue`	`hue`		Hue runs as this user. It is not configurable.
Cloudera Impala	`impala`	`impala`		An interactive query tool.
Llama	`llama`	`llama`
Apache Mahout				No special users.
MapReduce	`mapred`	`mapred`		Without Kerberos, the JobTracker and tasks run as this user. The LinuxTaskController binary is owned by this user for Kerberos. It would be complicated to use a different user ID.
Apache Oozie	`oozie`	`oozie`		The Oozie service runs as this user.
Parquet				No special users.
Apache Pig				No special users.
Cloudera Search	`solr`	`solr`		The Solr process runs as this user. It is not configurable.
Apache Spark	`spark`	`spark`		The Spark process runs as this user. It is not configurable.
Apache Sentry (incubating)	`sentry`	`sentry`		The Sentry service runs as this user.
Apache Sqoop	`sqoop`	`sqoop`		This user is only for the Sqoop1 Metastore, a configuration option that is not recommended.
Apache Sqoop2	`sqoop2`	`sqoop`		The Sqoop2 service runs as this user.
Apache Whirr				No special users.
YARN	`yarn`	`yarn`		Without Kerberos, all YARN services and applications run as this user. The LinuxContainerExecutor binary is owned by this user for Kerberos. It would be complicated to use a different user ID.
Apache ZooKeeper	`zookeeper`	`zookeeper`		The ZooKeeper process runs as this user. It is not configurable.
Other		`hadoop`	`yarn`, `hdfs`, `mapred`	This is a group with no associated Unix user ID or keytab.

Note:

The Kerberos principal names should be of the format, username/fully.qualified.domain.name@YOUR-REALM.COM, where the term username refers to the username of an existing UNIX account, such as hdfs or mapred. The table below lists the usernames to be used for the Kerberos principal names. For example, the Kerberos principal for Apache Flume would be flume/fully.qualified.domain.name@YOUR-REALM.COM.

Table 2. CDH 5 Keytabs and Keytab File Permissions
Project (UNIX ID)	Service	Kerberos Principal Primary	Filename (.keytab)	Keytab File Owner	Keytab File Group	File Permission (octal)
Flume (`flume`)	flume-AGENT	flume	flume	flume	flume	600
HBase (`hbase`)	hbase-REGIONSERVER	hbase	hbase	hbase	hbase	600
	hbase- HBASETHRIFTSERVER
	hbase- HBASERESTSERVER
	hbase-MASTER
HDFS (`hdfs`)	hdfs-NAMENODE	hdfs	hdfs Secondary: Merge hdfs and HTTP	hdfs	hdfs	600
	hdfs-DATANODE
	hdfs- SECONDARYNAMENODE
Hive (`hive`)	hive-HIVESERVER2	hive	hive	hive	hive	600
	hive-WEBHCAT	HTTP	HTTP
	hive-HIVEMETASTORE	hive	hive
HttpFS (`httpfs`)	hdfs-HTTPFS	httpfs	httpfs	httpfs	httpfs	600
Hue (`hue`)	hue-KT_RENEWER	hue	hue	hue	hue	600
Impala (`impala`)	impala-STATESTORE	impala	impala	impala	impala	600
	impala-CATALOGSERVER
	impala-IMPALAD
Llama `(llama)`	impala-LLAMA	llama	llama Secondary: Merge llama and HTTP	llama	llama	600
MapReduce (`mapred`)	mapreduce-JOBTRACKER	mapred	mapred Secondary: Merge mapred and HTTP	mapred	hadoop	600
MapReduce (`mapred`)	mapreduce- TASKTRACKER	mapred	mapred Secondary: Merge mapred and HTTP	mapred	hadoop	600
Oozie (`oozie`)	oozie-OOZIE_SERVER	oozie	oozie Secondary: Merge oozie and HTTP	oozie	oozie	600
Search (`solr`)	solr-SOLR_SERVER	solr	solr Secondary: Merge solr and HTTP	solr	solr	600
Sentry (`sentry`)	sentry-SENTRY_SERVER	sentry	sentry	sentry	sentry	600
Spark (`spark`)	spark_on_yarn-SPARK _YARN_HISTORY_SERVER	spark	spark	spark	spark	600
Sqoop (`sqoop`)
Sqoop2 (`sqoop2`)
YARN (`yarn`)	yarn-NODEMANAGER	yarn	yarn Secondary: Merge yarn and HTTP	yarn	hadoop	644
	yarn- RESOURCEMANAGER					600
	yarn-JOBHISTORY					600
ZooKeeper (`zookeeper`)	zookeeper-server	zookeeper	zookeeper	zookeeper	zookeeper	600

Page generated September 3, 2015.