This is the documentation for CDH 5.0.x. Documentation for other versions is available at Cloudera Documentation.

Hadoop Users in CDH 5

A number of special users are created by default when installing and using CDH 5. Given below is a list of users as of the latest CDH 5 release. Also listed below are the corresponding Kerberos principals and keytab files that should be created when you configure Kerberos security on your cluster.

Table 1. CDH 5 Users & Groups

Project

Unix User ID

Primary Group

Group Members

Notes

Apache Avro

 

No special users.

Apache Flume

flume flume

The sink that writes to HDFS as this user must have write privileges.

Apache HBase

hbase hbase

The Master and the RegionServer processes run as this user.

HDFS

hdfs hdfs impala

The NameNode and DataNodes run as this user, and the HDFS root directory as well as the directories used for edit logs should be owned by it.

Apache Hive

hive hive impala

The HiveServer2 process and the Hive Metastore processes run as this user.

A user must be defined for Hive access to its Metastore DB (e.g. MySQL or Postgres) but it can be any identifier and does not correspond to a Unix uid. This is javax.jdo.option.ConnectionUserName in hive-site.xml.

Apache HCatalog

hive hive

The WebHCat service (for REST access to Hive functionality) runs as the hive user. It is not configurable.

HttpFS

httpfs httpfs

The HttpFS service runs as this user.

*See HttpFS Security Configuration for instructions on how to generate the merged httpfs-http.keytab file.

Hue

hue hue

Hue runs as this user. It is not configurable.

Cloudera Impala

impala impala

An interactive query tool.

Llama

llama llama  

Apache Mahout

 

No special users.

MapReduce

mapred mapred

Without Kerberos, the JobTracker and tasks run as this user. The LinuxTaskController binary is owned by this user for Kerberos. It would be complicated to use a different user ID.

Apache Oozie

oozie oozie  

The Oozie service runs as this user.

Parquet

 

No special users.

Apache Pig

 

No special users.

Cloudera Search

solr solr

The Solr process runs as this user. It is not configurable.

Apache Spark

spark spark

The Spark process runs as this user. It is not configurable.

Apache Sentry (incubating)

 

No special users.

Apache Sqoop

sqoop sqoop

This user is only for the Sqoop1 Metastore, a configuration option that is not recommended.

Apache Sqoop2

sqoop2 sqoop

The Sqoop2 service runs as this user.

Apache Whirr

 

No special users.

YARN

yarn yarn

Without Kerberos, all YARN services and applications run as this user. The LinuxContainerExecutor binary is owned by this user for Kerberos. It would be complicated to use a different user ID.

Apache ZooKeeper

zookeeper zookeeper

The ZooKeeper process runs as this user. It is not configurable.

Other hadoop yarn, hdfs, mapred

This is a group with no associated Unix user ID or keytab.

  Note:

The Kerberos principal names should be of the format, username/fully.qualified.domain.name@YOUR-REALM.COM, where the term username refers to the username of an existing UNIX account, such as hdfs or mapred. The table below lists the usernames to be used for the Kerberos principal names. For example, the Kerberos principal for Apache Flume would be flume/fully.qualified.domain.name@YOUR-REALM.COM.

Table 2. CDH 5 Keytabs and Keytab File Permissions
Project (UNIX ID) Service Kerberos Principal Primary Filename (.keytab) Keytab File Owner Keytab File Group File Permission (octal)
Flume (flume) flume-AGENT flume flume flume flume 600
HBase (hbase) hbase-REGIONSERVER hbase hbase hbase hbase 600
hbase- HBASETHRIFTSERVER
hbase- HBASERESTSERVER
hbase-MASTER
HDFS (hdfs) hdfs-NAMENODE hdfs hdfs

Secondary: Merge hdfs and HTTP

hdfs hdfs 600
hdfs-DATANODE
hdfs- SECONDARYNAMENODE
Hive (hive) hive-HIVESERVER2 hive hive hive hive 600
hive-WEBHCAT HTTP HTTP
hive-HIVEMETASTORE hive hive
HttpFS (httpfs) hdfs-HTTPFS httpfs httpfs httpfs httpfs 600
Hue (hue) hue-KT_RENEWER hue hue hue hue 600
Impala (impala) impala-STATESTORE impala impala impala impala 600
impala-CATALOGSERVER
impala-IMPALAD
Llama (llama)            
MapReduce (mapred) mapreduce-JOBTRACKER mapred mapred

Secondary: Merge mapred and HTTP

mapred hadoop 600
mapreduce- TASKTRACKER
Oozie (oozie) oozie-OOZIE_SERVER oozie oozie

Secondary: Merge oozie and HTTP

oozie oozie 600
Search (solr) solr-SOLR_SERVER solr solr

Secondary: Merge solr and HTTP

solr solr 600
Sentry (sentry)            
Spark (spark)            
Sqoop (sqoop)            
Sqoop2 (sqoop2)            
YARN (yarn) yarn-NODEMANAGER yarn yarn

Secondary: Merge yarn and HTTP

yarn hadoop 644
yarn- RESOURCEMANAGER 600
yarn-JOBHISTORY 600
ZooKeeper (zookeeper) zookeeper-server zookeeper zookeeper zookeeper zookeeper 600
Page generated September 3, 2015.