To enable security on HDP, you must add optional information to various configuration files.
Before you begin, set JSVC_Home in hadoop-env.sh
.
For RHEL/CentOS/Oracle Linux:
export JSVC_HOME=/usr/libexec/bigtop-utils
For SLES:
export JSVC_HOME=/usr/lib/bigtop-utils
To the core-site.xml
file on every host in your cluster, you must add the following
information:
Table 23.3. core-site.xml
Property Name | Property Value | Description |
---|---|---|
|
|
Set the authentication type for the cluster. Valid values are: simple or kerberos. |
hadoop.rpc.protection | authentication; integrity; privacy | This is an [OPTIONAL] setting. If not set, defaults to
|
|
| Enable authorization for different protocols. |
|
The mapping rules. For
example |
The mapping from Kerberos principal names to local OS user names. See Creating Mappings Between Principals and UNIX Usernames for more information. |
The XML for these entries:
<property> <name>hadoop.security.authentication</name> <value>kerberos</value> <description> Set the authentication for the cluster. Valid values are: simple or kerberos. </description> </property> <property> <name>hadoop.security.authorization</name> <value>true</value> <description> Enable authorization for different protocols. </description> </property> <property> <name>hadoop.security.auth_to_local</name> <value> RULE:[2:$1@$0]([jt]t@.*EXAMPLE.COM)s/.*/mapred/ RULE:[2:$1@$0]([nd]n@.*EXAMPLE.COM)s/.*/hdfs/ RULE:[2:$1@$0](hm@.*EXAMPLE.COM)s/.*/hbase/ RULE:[2:$1@$0](rs@.*EXAMPLE.COM)s/.*/hbase/ DEFAULT</value> <description>The mapping from kerberos principal names to local OS user names.</description> </property>
When using the Knox Gateway, add the following to the
core-site.xml
file on the master nodes host in your cluster:
Table 23.4. core-site.xml
Property Name | Property Value | Description |
---|---|---|
hadoop.proxyuser.knox.groups | users | Grants proxy privileges for knox user. |
hadoop.proxyuser.knox.hosts | $knox_host_FQDN | Identifies the Knox Gateway host. |
The XML for these entries:
<property> <name>hadoop.security.authentication</name> <value>kerberos</value> <description> Set the authentication for the cluster. Valid values are: simple or kerberos. </description> </property> <property> <name>hadoop.security.authorization</name> <value>true</value> <description> Enable authorization for different protocols. </description> </property> <property> <name>hadoop.security.auth_to_local</name> <value> RULE:[2:$1@$0]([jt]t@.*EXAMPLE.COM)s/.*/mapred/ RULE:[2:$1@$0]([nd]n@.*EXAMPLE.COM)s/.*/hdfs/ RULE:[2:$1@$0](hm@.*EXAMPLE.COM)s/.*/hbase/ RULE:[2:$1@$0](rs@.*EXAMPLE.COM)s/.*/hbase/ DEFAULT</value> <description>The mapping from kerberos principal names to local OS user names.</description> </property> <property> <name>hadoop.proxyuser.knox.groups</name> <value>users</value> </property> <property> <name>hadoop.proxyuser.knox.hosts</name> <value>Knox.EXAMPLE.COM</value> </property>
To the hdfs-site.xml
file on every host in your cluster, you must add the following
information:
Table 23.5. hdfs-site.xml
Property Name | Property Value | Description |
---|---|---|
dfs.permissions.enabled | true | If true , permission checking in HDFS is enabled.
If false , permission checking is turned off,
but all other behavior is unchanged.
Switching from one parameter value to the other does not change the mode,
owner or group of files or directories.
|
dfs.permissions.supergroup | hdfs | The name of the group of super-users. |
dfs.block.access.token.enable | true | If true , access tokens are used as
capabilities for accessing DataNodes. If
false , no access tokens are checked on
accessing DataNodes. |
dfs.namenode.kerberos.principal | nn/_HOST@EXAMPLE.COM | Kerberos principal name for the NameNode. |
dfs.secondary.namenode.kerberos.principal | nn/_HOST@EXAMPLE.COM | Kerberos principal name for the secondary NameNode. |
dfs.web.authentication.kerberos.principal |
HTTP/_HOST@EXAMPLE.COM
|
The HTTP Kerberos principal used by Hadoop-Auth in the HTTP endpoint. The HTTP Kerberos principal MUST start with 'HTTP/' per Kerberos HTTP SPNEGO specification. |
dfs.web.authentication.kerberos.keytab
| /etc/security/keytabs/spnego.service.keytab | The Kerberos keytab file with the credentials for the HTTP Kerberos principal used by Hadoop-Auth in the HTTP endpoint. |
dfs.datanode.kerberos.principal
|
dn/_HOST@EXAMPLE.COM | The Kerberos principal that the DataNode runs as. "_HOST" is replaced by the real host name . |
dfs.namenode.keytab.file
| /etc/security/keytabs/nn.service.keytab | Combined keytab file containing the NameNode service and host principals. |
dfs.secondary.namenode.keytab.file
| /etc/security/keytabs/nn.service.keytab | Combined keytab file containing the NameNode service and host principals. <question?> |
dfs.datanode.keytab.file | /etc/security/keytabs/dn.service.keytab | The filename of the keytab file for the DataNode. |
dfs.https.port |
50470
| The HTTPS port to which the NameNode binds |
dfs.namenode.https-address
| Example:
| The HTTPS address to which the NameNode binds |
dfs.datanode.data.dir.perm
|
750
| The permissions that must be set on the
dfs.data.dir directories. The DataNode
will not come up if all existing
dfs.data.dir directories do not have
this setting. If the directories do not exist, they will be
created with this permission |
dfs.cluster.administrators |
hdfs
| ACL for who all can view the default servlets in the HDFS |
dfs.namenode.kerberos.internal.spnego.principal
|
${dfs.web.authentication.kerberos.principal}
| |
dfs.secondary.namenode.kerberos.internal.spnego.principal |
${dfs.web.authentication.kerberos.principal}
|
The XML for these entries:
<property> <name>dfs.permissions</name> <value>true</value> <description> If "true", enable permission checking in HDFS. If "false", permission checking is turned off, but all other behavior is unchanged. Switching from one parameter value to the other does not change the mode, owner or group of files or directories. </description> </property> <property> <name>dfs.permissions.supergroup</name> <value>hdfs</value> <description>The name of the group of super-users.</description> </property> <property> <name>dfs.namenode.handler.count</name> <value>100</value> <description>Added to grow Queue size so that more client connections are allowed</description> </property> <property> <name>ipc.server.max.response.size</name> <value>5242880</value> </property> <property> <name>dfs.block.access.token.enable</name> <value>true</value> <description> If "true", access tokens are used as capabilities for accessing datanodes. If "false", no access tokens are checked on accessing datanodes. </description> </property> <property> <name>dfs.namenode.kerberos.principal</name> <value>nn/_HOST@EXAMPLE.COM</value> <description> Kerberos principal name for the NameNode </description> </property> <property> <name>dfs.secondary.namenode.kerberos.principal</name> <value>nn/_HOST@EXAMPLE.COM</value> <description>Kerberos principal name for the secondary NameNode. </description> </property> <property> <!--cluster variant --> <name>dfs.secondary.http.address</name> <value>ip-10-72-235-178.ec2.internal:50090</value> <description>Address of secondary namenode web server</description> </property> <property> <name>dfs.secondary.https.port</name> <value>50490</value> <description>The https port where secondary-namenode binds</description> </property> <property> <name>dfs.web.authentication.kerberos.principal</name> <value>HTTP/_HOST@EXAMPLE.COM</value> <description> The HTTP Kerberos principal used by Hadoop-Auth in the HTTP endpoint. The HTTP Kerberos principal MUST start with 'HTTP/' per Kerberos HTTP SPNEGO specification. </description> </property> <property> <name>dfs.web.authentication.kerberos.keytab</name> <value>/etc/security/keytabs/spnego.service.keytab</value> <description>The Kerberos keytab file with the credentials for the HTTP Kerberos principal used by Hadoop-Auth in the HTTP endpoint. </description> </property> <property> <name>dfs.datanode.kerberos.principal</name> <value>dn/_HOST@EXAMPLE.COM</value> <description> The Kerberos principal that the DataNode runs as. "_HOST" is replaced by the real host name. </description> </property> <property> <name>dfs.namenode.keytab.file</name> <value>/etc/security/keytabs/nn.service.keytab</value> <description> Combined keytab file containing the namenode service and host principals. </description> </property> <property> <name>dfs.secondary.namenode.keytab.file</name> <value>/etc/security/keytabs/nn.service.keytab</value> <description> Combined keytab file containing the namenode service and host principals. </description> </property> <property> <name>dfs.datanode.keytab.file</name> <value>/etc/security/keytabs/dn.service.keytab</value> <description> The filename of the keytab file for the DataNode. </description> </property> <property> <name>dfs.https.port</name> <value>50470</value> <description>The https port where namenode binds</description> </property> <property> <name>dfs.https.address</name> <value>ip-10-111-59-170.ec2.internal:50470</value> <description>The https address where namenode binds</description> </property> <property> <name>dfs.datanode.data.dir.perm</name> <value>750</value> <description>The permissions that should be there on dfs.data.dir directories. The datanode will not come up if the permissions are different on existing dfs.data.dir directories. If the directories don't exist, they will be created with this permission.</description> </property> <property> <name>dfs.access.time.precision</name> <value>0</value> <description>The access time for HDFS file is precise upto this value.The default value is 1 hour. Setting a value of 0 disables access times for HDFS. </description> </property> <property> <name>dfs.cluster.administrators</name> <value> hdfs</value> <description>ACL for who all can view the default servlets in the HDFS</description> </property> <property> <name>ipc.server.read.threadpool.size</name> <value>5</value> <description></description> </property> <property> <name>dfs.namenode.kerberos.internal.spnego.principal</name> <value>${dfs.web.authentication.kerberos.principal}</value> </property> <property> <name>dfs.secondary.namenode.kerberos.internal.spnego.principal</name> <value>${dfs.web.authentication.kerberos.principal}</value> </property>
In addition, you must set the user on all secure DataNodes:
export HADOOP_SECURE_DN_USER=hdfs export HADOOP_SECURE_DN_PID_DIR=/grid/0/var/run/hadoop/$HADOOP_SECURE_DN_USER
To the mapred-site.xml
file on every host in your cluster, you must add the following
information:
Table 23.6. mapred-site.xml
Property Name | Property Value | Description | Final |
---|---|---|---|
mapreduce.jobtracker.kerberos.principal |
jt/_HOST@EXAMPLE.COM
| Kerberos principal name for the JobTracker | |
mapreduce.tasktracker.kerberos.principal
|
tt/_HOST@EXAMPLE.COM
| Kerberos principal name for the TaskTracker.
_HOST" is replaced by the host name of
the task tracker. | |
hadoop.job.history.user.location
| none | true | |
mapreduce.jobtracker.keytab.file
| /etc/security/keytabs/jt.service.keytab
| The keytab for the JobTracker principal | |
mapreduce.tasktracker.keytab.file
|
/etc/security/keytabs/tt.service.keytab
| The keytab for the Tasktracker principal | |
mapreduce.jobtracker.staging.root.dir
| /user
| The path prefix for the location of the staging directories. The next level is always the user's name. It is a path in the default file system | |
mapreduce.tasktracker.group
| hadoop | The group that the task controller uses for accessing the task controller. The mapred user must be a member and users should not be members. <question?> | |
mapreduce.jobtracker.split.metainfo.maxsize
|
50000000
| If the size of the split metainfo file is larger than this value, the JobTracker will fail the job during initialization. | true |
mapreduce.history.server.embedded
|
false
| Should the Job History server be embedded within the JobTracker process | true |
Note: cluster variant |
Example:
| ||
Note: cluster variant |
jt/_HOST@EXAMPLE.COM
| Kerberos principal name for JobHistory. This must map to the same user as the JT user. | true |
Note: cluster variant | /etc/security/keytabs/jt.service.keytab | The keytab for the JobHistory principal | |
mapred.jobtracker.blacklist.fault-timeout-window
| Example:
| 3-hour sliding window - the value is specified in minutes. | |
mapred.jobtracker.blacklist.fault-bucket-width
| Example:
| 15-minute bucket size - the value is specified in minutes. | |
mapred.queue.names
|
default
| Comma separated list of queues configured for this JobTracker. |
The XML for these entries:
<property> <name>mapreduce.jobtracker.kerberos.principal</name> <value>jt/_HOST@EXAMPLE.COM</value> <description> JT user name key. </description> </property> <property> <name>mapreduce.tasktracker.kerberos.principal</name> <value>tt/_HOST@EXAMPLE.COM</value> <description>tt user name key. "_HOST" is replaced by the host name of the task tracker. </description> </property> <property> <name>hadoop.job.history.user.location</name> <value>none</value> <final>true</final> </property> <property> <name>mapreduce.jobtracker.keytab.file</name> <value>/etc/security/keytabs/jt.service.keytab</value> <description> The keytab for the jobtracker principal. </description> </property> <property> <name>mapreduce.tasktracker.keytab.file</name> <value>/etc/security/keytabs/tt.service.keytab</value> <description>The filename of the keytab for the task tracker</description> </property> <property> <name>mapreduce.jobtracker.staging.root.dir</name> <value>/user</value> <description>The Path prefix for where the staging directories should be placed. The next level is always the user's name. It is a path in the default file system.</description> </property> <property> <name>mapreduce.tasktracker.group</name> <value>hadoop</value> <description>The group that the task controller uses for accessing the task controller. The mapred user must be a member and users should *not* be members.</description> </property> <property> <name>mapreduce.jobtracker.split.metainfo.maxsize</name> <value>50000000</value> <final>true</final> <description>If the size of the split metainfo file is larger than this, the JobTracker will fail the job during initialize. </description> </property> <property> <name>mapreduce.history.server.embedded</name> <value>false</value> <description>Should job history server be embedded within Job tracker process</description> <final>true</final> </property> <property> <name>mapreduce.history.server.http.address</name> <!--cluster variant --> <value>ip-10-111-59-170.ec2.internal:51111</value> <description>Http address of the history server</description> <final>true</final> </property> <property> <name>mapreduce.jobhistory.kerberos.principal</name> <!--cluster variant --> <value>jt/_HOST@EXAMPLE.COM</value> <description>Job history user name key. (must map to same user as JT user)</description> </property> <property> <name>mapreduce.jobhistory.keytab.file</name> <!--cluster variant --> <value>/etc/security/keytabs/jt.service.keytab</value> <description>The keytab for the job history server principal.</description> </property> <property> <name>mapred.jobtracker.blacklist.fault-timeout-window</name> <value>180</value> <description> 3-hour sliding window (value is in minutes) </description> </property> <property> <name>mapred.jobtracker.blacklist.fault-bucket-width</name> <value>15</value> <description> 15-minute bucket size (value is in minutes) </description> </property> <property> <name>mapred.queue.names</name> <value>default</value> <description> Comma separated list of queues configured for this jobtracker.</description> </property>
For HBase to run on a secured cluster, HBase must be able to authenticate
itself to HDFS. To the hbase-site.xml
file on your HBase
server, you must add the following information. There are no default values; the
following are all only examples:
Table 23.7. hbase-site.xml
Property Name | Property Value | Description |
---|---|---|
hbase.master.keytab.file
| /etc/security/keytabs/hm.service.keytab | The keytab for the HMaster service principal |
hbase.master.kerberos.principal |
hm/_HOST@EXAMPLE.COM | The Kerberos principal name that should be used to run the
HMaster process. If _HOST is used as the
hostname portion, it will be replaced with the actual
hostname of the running instance. |
hbase.regionserver.keytab.file | /etc/security/keytabs/rs.service.keytab | The keytab for the HRegionServer service principal |
hbase.regionserver.kerberos.principal |
rs/_HOST@EXAMPLE.COM
| The Kerberos principal name that should be used to run the
HRegionServer process. If _HOST is used as the
hostname portion, it will be replaced with the actual
hostname of the running instance. |
hbase.superuser
|
hbase
| Comma-separated List of users or groups that are allowed full privileges, regardless of stored ACLs, across the cluster. Only used when HBase security is enabled. |
hbase.coprocessor.region.classes
| Comma-separated list of Coprocessors that are loaded by default on all tables. For any override coprocessor method, these classes will be called in order. After implementing your own Coprocessor, just put it in HBase's classpath and add the fully qualified class name here. A coprocessor can also be loaded on demand by setting HTableDescriptor. | |
hbase.coprocessor.master.classes
| Comma-separated list of
org.apache.hadoop.hbase.coprocessor.MasterObserver
coprocessors that are loaded by default on the active HMaster
process. For any implemented coprocessor methods, the listed
classes will be called in order. After implementing your own
MasterObserver, just put it in HBase's classpath and add the
fully qualified class name here. |
The XML for these entries:
<property> <name>hbase.master.keytab.file</name> <value>/etc/security/keytabs/hm.service.keytab</value> <description>Full path to the kerberos keytab file to use for logging in the configured HMaster server principal. </description> </property> <property> <name>hbase.master.kerberos.principal</name> <value>hm/_HOST@EXAMPLE.COM</value> <description>Ex. "hbase/_HOST@EXAMPLE.COM". The kerberos principal name that should be used to run the HMaster process. The principal name should be in the form: user/hostname@DOMAIN. If "_HOST" is used as the hostname portion, it will be replaced with the actual hostname of the running instance. </description> </property> <property> <name>hbase.regionserver.keytab.file</name> <value>/etc/security/keytabs/rs.service.keytab</value> <description>Full path to the kerberos keytab file to use for logging in the configured HRegionServer server principal. </description> </property> <property> <name>hbase.regionserver.kerberos.principal</name> <value>rs/_HOST@EXAMPLE.COM</value> <description>Ex. "hbase/_HOST@EXAMPLE.COM". The kerberos principal name that should be used to run the HRegionServer process. The principal name should be in the form: user/hostname@DOMAIN. If _HOST is used as the hostname portion, it will be replaced with the actual hostname of the running instance. An entry for this principal must exist in the file specified in hbase.regionserver.keytab.file </description> </property> <!--Additional configuration specific to HBase security --> <property> <name>hbase.superuser</name> <value>hbase</value> <description>List of users or groups (comma-separated), who are allowed full privileges, regardless of stored ACLs, across the cluster. Only used when HBase security is enabled. </description> </property> <property> <name>hbase.coprocessor.region.classes</name> <value></value> <description>A comma-separated list of Coprocessors that are loaded by default on all tables. For any override coprocessor method, these classes will be called in order. After implementing your own Coprocessor, just put it in HBase's classpath and add the fully qualified class name here. A coprocessor can also be loaded on demand by setting HTableDescriptor. </description> </property> <property> <name>hbase.coprocessor.master.classes</name> <value></value> <description>A comma-separated list of org.apache.hadoop.hbase.coprocessor.MasterObserver coprocessors that are loaded by default on the active HMaster process. For any implemented coprocessor methods, the listed classes will be called in order. After implementing your own MasterObserver, just put it in HBase's classpath and add the fully qualified class name here. </description> </property>
Hive Metastore supports Kerberos authentication for Thrift clients only. HiveServer does not support Kerberos authentication for any clients:
Table 23.8. hive-site.xml
Property Name | Property Value | Description |
---|---|---|
hive.metastore.sasl.enabled
| true | If true , the Metastore Thrift
interface will be secured with SASL and clients must
authenticate with Kerberos |
hive.metastore.kerberos.keytab.file
| /etc/security/keytabs/hive.service.keytab | The keytab for the Metastore Thrift service principal |
hive.metastore.kerberos.principal | hive/_HOST@EXAMPLE.COM |
The service principal for the Metastore Thrift server. If _HOST
is used as the hostname portion, it will be replaced
with the actual hostname of the running
instance. |
hive.metastore.cache.pinobjtypes
| Table,Database,Type,FieldSchema,Order | Comma-separated Metastore object types that should be pinned in the cache |
The XML for these entries:
<property> <name>hive.metastore.sasl.enabled</name> <value>true</value> <description>If true, the metastore thrift interface will be secured with SASL. Clients must authenticate with Kerberos.</description> </property> <property> <name>hive.metastore.kerberos.keytab.file</name> <value>/etc/security/keytabs/hive.service.keytab</value> <description>The path to the Kerberos Keytab file containing the metastore thrift server's service principal.</description> </property> <property> <name>hive.metastore.kerberos.principal</name> <value>hive/_HOST@EXAMPLE.COM</value> <description>The service principal for the metastore thrift server. The special string _HOST will be replaced automatically with the correct hostname.</description> </property> <property> <name>hive.metastore.cache.pinobjtypes</name> <value>Table,Database,Type,FieldSchema,Order</value> <description>List of comma separated metastore object types that should be pinned in the cache</description> </property>
To the oozie-site.xml
file, you must add the following
information:
Table 23.9. oozie-site.xml
Property Name | Property Value | Description |
---|---|---|
oozie.service.AuthorizationService.security.enabled
| true | Specifies whether security (user name/admin role) is enabled or not. If it is disabled any user can manage the Oozie system and manage any job. |
oozie.service.HadoopAccessorService.kerberos.enabled
| true | Indicates if Oozie is configured to use Kerberos |
local.realm
|
EXAMPLE.COM | Kerberos Realm used by Oozie and Hadoop. Using
local.realm to be aligned with Hadoop
configuration. |
oozie.service.HadoopAccessorService.keytab.file
| /etc/security/keytabs/oozie.service.keytab | The keytab for the Oozie service principal. |
oozie.service.HadoopAccessorService.kerberos.principal
|
oozie/_HOSTl@EXAMPLE.COM | Kerberos principal for Oozie service |
oozie.authentication.type
|
kerberos
| |
oozie.authentication.kerberos.principal |
HTTP/_HOST@EXAMPLE.COM
| Whitelisted job tracker for Oozie service |
oozie.authentication.kerberos.keytab
| /etc/security/keytabs/spnego.service.keytab | Location of the Oozie user keytab file. |
oozie.service.HadoopAccessorService.nameNode.whitelist
| ||
oozie.authentication.kerberos.name.rules
|
RULE:[2:$1@$0]([jt]t@.*EXAMPLE.COM)s/.*/mapred/ RULE:[2:$1@$0]([nd]n@.*EXAMPLE.COM)s/.*/hdfs/ RULE:[2:$1@$0](hm@.*EXAMPLE.COM)s/.*/hbase/ RULE:[2:$1@$0](rs@.*EXAMPLE.COM)s/.*/hbase/ DEFAULT | The mapping from Kerberos principal names to local OS user names. See Creating Mappings Between Principals and UNIX Usernames for more information. |
oozie.service.ProxyUserService.proxyuser.knox.groups | users | Grant proxy privileges to the knox
user. Note only required when using a Knox Gateway. |
oozie.service.ProxyUserService.proxyuser.knox.hosts | $knox_host_FQDN | Identifies the Knox Gateway. Note only required when using a Knox Gateway. |
To the webhcat-site.xml
file, you must add the
following information:
Table 23.10. webhcat-site.xml
Property Name | Property Value | Description |
---|---|---|
templeton.kerberos.principal | HTTP/_HOST@EXAMPLE.COM | |
templeton.kerberos.keytab | /etc/security/keytabs/spnego.service.keytab | |
templeton.kerberos.secret | secret | |
hadoop.proxyuser.knox.groups | users | Grant proxy privileges to the knox
user. Note only required when using a Knox Gateway. |
hadoop.proxyuser.knox.hosts | $knox_host_FQDN | Identifies the Knox Gateway. Note only required when using a Knox Gateway. |