Configuring the Sentry Service
This topic describes how to enable the Sentry service for Hive and Impala, and configuring the Hive metastore to communicate with the service.
Enabling the Sentry Service for Hive
Prerequisites
- Ensure all the action items under Prerequisites are complete.
- The Hive warehouse directory (/user/hive/warehouse or any path you specify as hive.metastore.warehouse.dir in your
hive-site.xml) must be owned by the Hive user and group.
- Permissions on the warehouse directory must be set as follows (see following Note for caveats):
- 771 on the directory itself (for example, /user/hive/warehouse)
- 771 on all subdirectories (for example, /user/hive/warehouse/mysubdir)
- All files and subdirectories should be owned by hive:hive
For example:$ sudo -u hdfs hdfs dfs -chmod -R 771 /user/hive/warehouse $ sudo -u hdfs hdfs dfs -chown -R hive:hive /user/hive/warehouse
- Permissions on the warehouse directory must be set as follows (see following Note for caveats):
- Disable impersonation for HiveServer2 in the Cloudera Manager Admin Console:
- Go to the Hive service.
- Click the Configuration tab.
- Under the HiveServer2 role group, uncheck the HiveServer2 Enable Impersonation property, and click Save Changes.
- Enable the Hive user to submit MapReduce jobs.
- Go to the MapReduce service.
- Click the Configuration tab.
- Under a TaskTracker role group go to the Security category.
- Set the Minimum User ID for Job Submission property to zero (the default is 1000) and click Save Changes.
- Repeat steps 1-4 for every TaskTracker role group for the MapReduce service that is associated with Hive, if more than one exists.
- Restart the MapReduce service.
- Enable the Hive user to submit YARN jobs.
- Go to the YARN service.
- Click the Configuration tab.
- Under a NodeManager role group go to the Security category.
- Ensure the Allowed System Users property includes the hive user. If not, add hive and click Save Changes.
- Repeat steps 1-4 for every NodeManager role group for the YARN service that is associated with Hive, if more than one exists.
- Restart the YARN service.
Configuring HiveServer2 for the Sentry Service
<property> <name>hive.security.authorization.task.factory</name> <value>org.apache.sentry.binding.hive.SentryHiveAuthorizationTaskFactoryImpl</value> </property> <property> <name>hive.server2.session.hook</name> <value>org.apache.sentry.binding.hive.HiveAuthzBindingSessionHook</value> </property> <property> <name>hive.sentry.conf.url</name> <value>file:///{{CMF_CONF_DIR}}/sentry-site.xml</value> </property>
Configuring the Hive Metastore for the Sentry Service
Configuring Pig and HCatalog for the Sentry Service
Once you have the Sentry service up and running, and Hive has been configured to use the Sentry service, there are some configuration changes you must make to your cluster to allow Pig, MapReduce (using HCatLoader, HCatStorer) and WebHCat queries to access Sentry-secured data stored in Hive.
With HDFS extended ACLs enabled, Cloudera recommends you set the permissions for the Hive warehouse directory, /user/hive/warehouse, to 771 so users other than the owner and group only have execute permissions. Since by default, the /user/hive/warehouse directory is owned by hive:hive, this also restricts requests from any other users at the HDFS level.
- Use HDFS ACLs to define permissions on a specific directory or file of HDFS. This directory/file is generally mapped to a database, table, partition, or a data file.
- Users running these jobs should have the required permissions in Sentry to add new metadata or read metadata from the Hive Metastore Server. For instructions on how to set up the required permissions, see Hive SQL Syntax for Use with Sentry. You can use HiveServer2's command line interface, Beeline to update the Sentry database with the user privileges.
- A user who is using Pig HCatLoader will require read permissions on a specific table or partition. In such a case, you can GRANT read access to the user in Sentry and set the ACL to read and execute, on the file being accessed.
- A user who is using Pig HCatStorer will require ALL permissions on a specific table. In this case, you GRANT ALL access to the user in Sentry and set the ACL to write and execute, on the table being used.
Configuring the Hive Metastore to Communicate with Sentry
<property> <name>hive.metastore.client.impl</name> <value>org.apache.sentry.binding.metastore.SentryHiveMetaStoreClient</value> <description>Sets custom Hive Metastore client which Sentry uses to filter out metadata.</description> </property> <property> <name>hive.metastore.pre.event.listeners</name> <value>org.apache.sentry.binding.metastore.MetastoreAuthzBinding</value> <description>list of comma separated listeners for metastore events.</description> </property> <property> <name>hive.metastore.event.listeners</name> <value>org.apache.sentry.binding.metastore.SentryMetastorePostEventListener</value> <description>list of comma separated listeners for metastore, post events.</description> </property>
Securing the Hive Metastore
<property> <name>sentry.hive.testing.mode</name> <value>true</value> </property>Impala does not require this flag to be set.
- To secure the Hive metastore; see Hive Metastore Server Security Configuration.
- In addition, allow access to the metastore only from the HiveServer2server (see "Securing the Hive Metastore" under HiveServer2 Security Configuration) and then disable local access to the HiveServer2 server.
Configuring Impala for the Sentry Service
Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)
Enabling the Sentry Service for Impala
To use the Sentry service:- Enable the Sentry service for Hive. For details on how to do this, see Enabling the Sentry Service for Hive.
- Go to the Impala service.
- Click the Configuration tab.
- In the Service-Wide category, set the Sentry Service property to Sentry.
- Restart Impala.
Configuring Impala as a Client for the Sentry Service
<property> <name>sentry.service.client.server.rpc-port</name> <value>3893</value> </property> <property> <name>sentry.service.client.server.rpc-address</name> <value>hostname</value> </property> <property> <name>sentry.service.client.server.rpc-connection-timeout</name> <value>200000</value> </property> <property> <name>sentry.service.security.mode</name> <value>none</value> </property>Other configuration changes required include:
- To enable the Sentry policy service, the following flag should be set on the catalogd and the impalad.
--sentry_config=<absolute path to sentry service configuration file>
- To enable authorization based on policy server metadata set the following flag on the impalad.
--server_name=<server name>
- To enable authorization based on a file-based policy set the following flags on the impalad.
--server_name=<server name> --authorization_policy_file=<path to policy file>
If the --authorization_policy_file flag is set, Impala will use the policy file-based approach. Otherwise, the policy server metadata approach will be used to implement authorization.
- The impala user also needs to be added to list of administrative users of the Sentry Policy Server. For more details, see SENTRY-191.