Configuring the Sentry Service

This topic describes how to enable the Sentry service for Hive and Impala, and configuring the Hive metastore to communicate with the service.

Enabling the Sentry Service Using Cloudera Manager
Enabling the Sentry Service Using the Command Line
Configuring Pig and HCatalog for the Sentry Service
Securing the Hive Metastore
Using User-Defined Functions with HiveServer2

Enabling the Sentry Service Using Cloudera Manager

Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)

Before Enabling the Sentry Service
Enabling the Sentry Service for Hive
Enabling the Sentry Service for Impala
Enabling the Sentry Service for Hue

Before Enabling the Sentry Service

Ensure you satisfy all the Prerequisites for the Sentry service.
The Hive warehouse directory (/user/hive/warehouse or any path you specify as hive.metastore.warehouse.dir in your hive-site.xml) must be owned by the Hive user and group.
- Permissions on the warehouse directory must be set as follows (see following Note for caveats):
  - 771 on the directory itself (for example, /user/hive/warehouse)
  - 771 on all subdirectories (for example, /user/hive/warehouse/mysubdir)
  - All files and subdirectories should be owned by hive:hive
  For example:
```
$ sudo -u hdfs hdfs dfs -chmod -R 771 /user/hive/warehouse
$ sudo -u hdfs hdfs dfs -chown -R hive:hive /user/hive/warehouse
```
  Note:
  - If you set hive.warehouse.subdir.inherit.perms to true in hive-site.xml, the permissions on the subdirectories will be set when you set permissions on the warehouse directory itself.
  - If a user has access to any object in the warehouse, that user will be able to execute use default. This ensures that use default commands issued by legacy applications work when Sentry is enabled.
  Important: These instructions override the recommendations in the Hive section of the CDH 5 Installation Guide.
Disable impersonation for HiveServer2 in the Cloudera Manager Admin Console:
1. Go to the Hive service.
2. Click the Configuration tab.
3. Under the HiveServer2 role group, uncheck the HiveServer2 Enable Impersonation property, and click Save Changes.
If you are using MapReduce, enable the Hive user to submit MapReduce jobs.
1. Open the Cloudera Manager Admin Console and go to the MapReduce service.
2. Click the Configuration tab.
3. Under a TaskTracker role group go to the Security category.
4. Set the Minimum User ID for Job Submission property to zero (the default is 1000) and click Save Changes.
5. Repeat steps 1-4 for every TaskTracker role group for the MapReduce service that is associated with Hive, if more than one exists.
6. Restart the MapReduce service.
If you are using YARN, enable the Hive user to submit YARN jobs.
1. Open the Cloudera Manager Admin Console and go to the YARN service.
2. Click the Configuration tab.
3. Under a NodeManager role group go to the Security category.
4. Ensure the Allowed System Users property includes the hive user. If not, add hive and click Save Changes.
5. Repeat steps 1-4 for every NodeManager role group for the YARN service that is associated with Hive, if more than one exists.
6. Restart the YARN service.

Enabling the Sentry Service for Hive

Go to the Hive service.
Click the Configuration tab.
In the Service-Wide category, set the Sentry Service property to Sentry.
Restart the Hive service.

Enabling the Sentry Service for Impala

Enable the Sentry service for Hive (as instructed above).
Go to the Impala service.
Click the Configuration tab.
In the Service-Wide category, set the Sentry Service property to Sentry.
Restart Impala.

Enabling the Sentry Service for Hue

To interact with Sentry using Hue, enable the Sentry service as follows:

Enable the Sentry service for Hive and Impala (as instructed above).
Go to the Hue service.
Click the Configuration tab.
In the Service-Wide category, set the Sentry Service property to Sentry.
Restart Hue.

Enabling the Sentry Service Using the Command Line

Before Enabling the Sentry Service
Configuring HiveServer2 for the Sentry Service
Configuring the Hive Metastore for the Sentry Service
Configuring Impala as a Client for the Sentry Service

Before Enabling the Sentry Service

The Hive warehouse directory (/user/hive/warehouse or any path you specify as hive.metastore.warehouse.dir in your hive-site.xml) must be owned by the Hive user and group.
- Permissions on the warehouse directory must be set as follows (see following Note for caveats):
  - 771 on the directory itself (for example, /user/hive/warehouse)
  - 771 on all subdirectories (for example, /user/hive/warehouse/mysubdir)
  - All files and subdirectories should be owned by hive:hive
  For example:
```
$ sudo -u hdfs hdfs dfs -chmod -R 771 /user/hive/warehouse
$ sudo -u hdfs hdfs dfs -chown -R hive:hive /user/hive/warehouse
```
  Note:
  - If you set hive.warehouse.subdir.inherit.perms to true in hive-site.xml, the permissions on the subdirectories will be set when you set permissions on the warehouse directory itself.
  - If a user has access to any object in the warehouse, that user will be able to execute use default. This ensures that use default commands issued by legacy applications work when Sentry is enabled.
  Important: These instructions override the recommendations in the Hive section of the CDH 5 Installation Guide.
HiveServer2 impersonation must be turned off.
If you are using MapReduce, you must enable the Hive user to submit MapReduce jobs. You can ensure that this is true by setting the minimum user ID for job submission to 0. Edit the taskcontroller.cfg file and set min.user.id=0.
If you are using YARN, you must enable the Hive user to submit YARN jobs, add the user hive to the allowed.system.users configuration property. Edit the container-executor.cfg file and add hive to the allowed.system.users property. For example,
```
allowed.system.users = nobody,impala,hive
```
Important: You must restart the cluster and HiveServer2 after changing these values.

Configuring HiveServer2 for the Sentry Service

Add the following properties to hive-site.xml to allow the Hive service to communicate with the Sentry service.

<property>
   <name>hive.security.authorization.task.factory</name>
   <value>org.apache.sentry.binding.hive.SentryHiveAuthorizationTaskFactoryImpl</value>
</property>
<property>
   <name>hive.server2.session.hook</name>
   <value>org.apache.sentry.binding.hive.HiveAuthzBindingSessionHook</value>
</property>
<property>
   <name>hive.sentry.conf.url</name>
   <value>file:///{{PATH/TO/DIR}}/sentry-site.xml</value>
</property>

Configuring the Hive Metastore for the Sentry Service

Add the following properties to hive-site.xml to allow the Hive metastore to communicate with the Sentry service.

<property>
    <name>hive.metastore.client.impl</name>
    <value>org.apache.sentry.binding.metastore.SentryHiveMetaStoreClient</value>
    <description>Sets custom Hive metastore client which Sentry uses to filter out metadata.</description>
</property>

<property>  
    <name>hive.metastore.pre.event.listeners</name>  
    <value>org.apache.sentry.binding.metastore.MetastoreAuthzBinding</value>  
    <description>list of comma separated listeners for metastore events.</description>
</property>

<property>
    <name>hive.metastore.event.listeners</name>  
    <value>org.apache.sentry.binding.metastore.SentryMetastorePostEventListener</value>  
    <description>list of comma separated listeners for metastore, post events.</description>
</property>

Configuring Impala as a Client for the Sentry Service

Set the following configuration properties in sentry-site.xml.

<property>
   <name>sentry.service.client.server.rpc-port</name>
   <value>3893</value>
</property>
<property>
   <name>sentry.service.client.server.rpc-address</name>
   <value>hostname</value>
</property>
<property>
   <name>sentry.service.client.server.rpc-connection-timeout</name>
   <value>200000</value>
</property>
<property>
   <name>sentry.service.security.mode</name>
   <value>none</value>
</property>

You must also add the following configuration properties to Impala's /etc/default/impala file. For more information , see Configuring Impala Startup Options through the Command Line.

On the catalogd and the impalad.

--sentry_config=<absolute path to sentry service configuration file>

On the impalad.
```
--server_name=<server name>
```
If the --authorization_policy_file flag is set, Impala will use the policy file-based approach. Otherwise, the database-backed approach will be used to implement authorization.

Configuring Pig and HCatalog for the Sentry Service

Once you have the Sentry service up and running, and Hive has been configured to use the Sentry service, there are some configuration changes you must make to your cluster to allow Pig, MapReduce (using HCatLoader, HCatStorer) and WebHCat queries to access Sentry-secured data stored in Hive.

Since the Hive warehouse directory is owned by hive:hive, with its permissions set to 771, with these settings, other user requests such as commands coming through Pig jobs, WebHCat queries, and MapReduce jobs, may fail. To give these users access, perform the following configuration changes:

Use HDFS ACLs to define permissions on a specific directory or file of HDFS. This directory/file is generally mapped to a database, table, partition, or a data file.
Users running these jobs should have the required permissions in Sentry to add new metadata or read metadata from the Hive Metastore Server. For instructions on how to set up the required permissions, see Hive SQL Syntax for Use with Sentry. You can use HiveServer2's command line interface, Beeline to update the Sentry database with the user privileges.

Examples:

A user who is using Pig HCatLoader will require read permissions on a specific table or partition. In such a case, you can GRANT read access to the user in Sentry and set the ACL to read and execute, on the file being accessed.
A user who is using Pig HCatStorer will require ALL permissions on a specific table. In this case, you GRANT ALL access to the user in Sentry and set the ACL to write and execute, on the table being used.

Securing the Hive Metastore

It's important that the Hive metastore be secured. If you want to override the Kerberos prerequisite for the Hive metastore, set the sentry.hive.testing.mode property to true to allow Sentry to work with weaker authentication mechanisms. Add the following property to the HiveServer2 and Hive metastore's sentry-site.xml:

<property>
  <name>sentry.hive.testing.mode</name>
  <value>true</value>
</property>

Impala does not require this flag to be set.

You can also set the property in Cloudera Manager. Go to the Hive service and open the Configuration tab. Search for the Hive Service Advanced Configuration Snippet (Safety Valve) for sentry-site.xml. Click the plus sign (+) to add a new property with the following values:

Name: sentry.hive.testing.mode
Value: true

You canturn on Hive metastore security using the instructions in Cloudera Security. To secure the Hive metastore; see Hive Metastore Server Security Configuration.

Using User-Defined Functions with HiveServer2

The ADD JAR command does not work with HiveServer2 and the Beeline client when Beeline runs on a different host. As an alternative to ADD JAR, Hive's auxiliary paths functionality should be used. There are some differences in the procedures for creating permanent functions and temporary functions when Sentry is enabled. For detailed instructions, see:

Migrating from Sentry Policy Files to the Sentry Service

Sentry Debugging and Failure Scenarios