Sentry Service Configuration
The Sentry service is an RPC server that stores the authorization metadata in an underlying relational database and provides RPC interfaces to retrieve and manipulate privileges. It supports secure access to services using Kerberos. The Hive and Impala services are clients of this service. The service provides authorization metadata from the database-backed storage; it does not handle actual privilege validation.
The motivation behind introducing a new Sentry service is to make it easier to handle user privileges than the existing policy file approach. Providing a database instead, allows you to use the more traditional GRANT/REVOKE statements to modify privileges.
Prerequisites
Sentry depends on an underlying authentication framework to reliably identify the requesting user. It requires:
- CDH 5.1.x
- HiveServer2 with strong authentication (Kerberos or LDAP)
- A secure Hadoop cluster
This is to prevent a user bypassing the authorization and gaining direct access to the underlying data.
- The Hive warehouse directory (/user/hive/warehouse or any
path you specify as hive.metastore.warehouse.dir in your hive-site.xml) must be
owned by the Hive user and group.
- Permissions on the warehouse directory must be
set as follows (see following Note for caveats):
- 771 on the directory itself (for example, /user/hive/warehouse)
- 771 on all subdirectories (for example, /user/hive/warehouse/mysubdir)
- All files and subdirectories should be owned by hive:hive
For example:$ sudo -u hdfs hdfs dfs -chmod -R 771 /user/hive/warehouse $ sudo -u hdfs hdfs dfs -chown -R hive:hive /user/hive/warehouse
Note: - If you set hive.warehouse.subdir.inherit.perms to true in hive-site.xml, the permissions on the subdirectories will be set when you set permissions on the warehouse directory itself.
- If a user has access to any object in the warehouse, that user will be able to execute use default. This ensures that use default commands issued by legacy applications work when Sentry is enabled. Note that you can protect objects in the default database (or any other database) by means of a policy file.
Important: These instructions override the recommendations in the Hive section of the CDH 5 Installation Guide.
- Permissions on the warehouse directory must be
set as follows (see following Note for caveats):
- HiveServer2 impersonation must be turned off.
- The Hive user must be able to submit MapReduce jobs. You can ensure
that this is true by setting the minimum user ID for job submission to 0.
Edit the taskcontroller.cfg file and set min.user.id=0. To enable the Hive user to submit YARN jobs, add the user hive to the allowed.system.users configuration property. Edit the container-executor.cfg file and add hive to the allowed.system.users property. For example,
allowed.system.users=nobody,impala,hive
Important: - You must restart the cluster and HiveServer2 after changing this value, whether you use Cloudera Manager or not.
- These instructions override the instructions under Configuring MRv1 Security
- These instructions override the instructions under Configuring YARN Security
Privilege Model
With CDH 5.1, the privilege model has undergone changes to accomodate the new grant/revoke syntax that is used with the Sentry service. These changes are common to both the new database-backed Sentry service, as well as the previous policy file approach.
- Allows any user to execute show function, desc function, and show locks.
- Allows the user to see only those tables and databases for which this user has privileges.
- Requires a user to have the necessary privileges on the URI to execute HiveQL operations that take in a location. Examples of such operations include LOAD, IMPORT, and EXPORT.
For more information, see Appendix: Authorization Privilege Model for Hive and Impala.
Users and Groups
- A user is an entity that is permitted by the authentication subsystem to access the Hive service. This entity can be a Kerberos principal, an LDAP userid, or an artifact of some other pluggable authentication system supported by HiveServer2.
- A group connects the authentication system with the authorization system. It is a collection of one or more users who have been granted one or more authorization roles. Sentry allows a set of roles to be configured for a group.
- A configured group provider determines a user’s affiliation with a group. The current release supports HDFS-backed groups and locally configured groups.
User to Group Mapping
You can configure Sentry to use either Hadoop groups or groups defined in the policy file. By default, Sentry looks up groups locally, but it can be configured to look up Hadoop groups using LDAP (for Active Directory). Local groups will be looked up on the host Sentry runs on. For Hive, this will be the host running HiveServer2.
Group mappings in Sentry can be summarized as in the figure below:
Configuring Hadoop Groups
<property> <name>hive.sentry.provider</name> <value>org.apache.sentry.provider.file.HadoopGroupResourceAuthorizationProvider</value> </property>
Configuring Local Groups
- Define local groups in a [users] section of the
Policy file. For
example:
[users] user1 = group1, group2, group3 user2 = group2, group3
- In sentry-site.xml, set hive.sentry.provider as
follows:
<property> <name>hive.sentry.provider</name> <value>org.apache.sentry.provider.file.LocalGroupResourceAuthorizationProvider</value> </property>
Setup and Configuration
Installing and Upgrading Sentry
Upgrading Sentry from CDH 4 to CDH 5
To upgrade Sentry from CDH 4 to CDH 5, you must uninstall the old version and install the new version. If you have already performed the steps to uninstall CDH 4 and all components, as described under Upgrading from CDH 4 to CDH 5, you can skip Step 1 below and proceed with installing the new CDH 5 version of Sentry.
- Remove the CDH 4 Version of
Sentry
Remove Sentry as follows, depending on your operating system:
OS Command RHEL $ sudo yum remove sentry
SLES $ sudo zypper remove sentry
Ubuntu or Debian $ sudo apt-get remove sentry
- Install the New Version of Sentry
Follow instructions in the next section to install the CDH 5 version of Sentry.
Important: Configuration files- If you install a newer version of a package that is already on the system, configuration files that you have modified will remain intact.
- If you uninstall a package, the package manager renames any configuration files you have modified from <file> to <file>.rpmsave. If you then re-install the package (probably to install a new version) the package manager creates a new <file> with applicable defaults. You are responsible for applying any changes captured in the original configuration file to the new configuration file. In the case of Ubuntu and Debian upgrades, you will be prompted if you have made changes to a file for which there is a new version; for details, see Automatic handling of configuration files by dpkg.
The upgrade is now complete.
Installing Sentry
Install Sentry as follows, depending on your operating system:OS | Command |
---|---|
RHEL |
$ sudo yum install sentry |
SLES |
$ sudo zypper install sentry |
Ubuntu or Debian |
$ sudo apt-get update; $ sudo apt-get install sentry |
Starting the Sentry Service
- Set the SENTRY_HOME and HADOOP_HOME parameters.
- Create the Sentry database schema using the Sentry schematool.
Sentry, by default, does not initialize the schema. The schematool is a built-in way
for you to deploy the backend schema required by the Sentry service. For example, the
following command uses the schematool to initialize the schema for a MySQL database.
bin/sentry --command schema-tool --conffile <sentry-site.xml> --dbType mysql --initSchema
Alternatively, you can set the sentry.verify.schema.version configuration property to false. However, this is not recommended. - Start the Sentry
service.
bin/sentry --command service --conffile <sentry-site.xml>
Hive SQL Syntax
Permissions stored in the Sentry service are configured through Grant and Revoke statements issued either interactively or programmatically through the HiveServer2 SQL command line interface, Beeline. The syntax described below is very similar to the GRANT/REVOKE commands available in well-established relational database systems.
CREATE ROLE Statement
The CREATE ROLE statement creates a role to which privileges can be granted. Privileges can be granted to roles, which can then be assigned to users. A user that has been assigned a role will only be able to exercise the privileges of that role.Only users that have administrative privileges can create/drop roles. By default, the hive, impala and hue users have admin privileges in Sentry.
CREATE ROLE [role_name];
DROP ROLE Statement
The DROP ROLE statement can be used to remove a role from the database. Once dropped, the role will be revoked for all users to whom it was previously assigned. Queries that are already executing will not be affected. However, since Hive checks user privileges before executing each query, active user sessions in which the role has already been enabled will be affected.DROP ROLE [role_name];
GRANT ROLE Statement
The GRANT ROLE statement can be used to grant roles to groups. Only sentry admin users can grant the role to a group.GRANT ROLE role_name [, role_name] TO GROUP <groupName> [,GROUP <groupName>]
REVOKE ROLE Statement
The REVOKE ROLE statement can be used to revoke roles from groups. Only sentry admin users can revoke the role from a group.REVOKE ROLE role_name [, role_name] FROM GROUP <groupName> [,GROUP <groupName>]
GRANT <PRIVILEGE> Statement
In order to grant privileges on an object to a role, the user must be a sentry admin user.GRANT <PRIVILEGE> [, <PRIVILEGE> ] ON <OBJECT> <object_name> TO ROLE <roleName> [,ROLE <roleName>]
REVOKE <PRIVILEGE> Statement
Since only authorized admin users can create roles, consequently only sentry admin users can revoke privileges from a group.REVOKE <PRIVILEGE> [, <PRIVILEGE> ] ON <OBJECT> <object_name> FROM ROLE <roleName> [,ROLE <roleName>]
SET ROLE Statement
The SET ROLE statement can be used to specify a role to be enabled for the current session. A user can only enable a role that has been granted to them. Any roles not listed and not already enabled are disabled for the current session. If no roles are enabled, the user will have the privileges granted by any of the roles that (s)he belongs to.SET ROLE <roleName>;
SET ROLE ALL;
SET ROLE NONE;
SHOW Statement
SHOW ROLES;
SHOW CURRENT ROLES;
SHOW ROLE GRANT GROUP <groupName>;
The SHOW statement can also be used to list the privileges that have been granted to a role or all the grants given to a role for a particular object.
SHOW GRANT ROLE <roleName>;
SHOW GRANT ROLE <roleName> on OBJECT <objectName>;
Example: Using Grant/Revoke Statements to Match an Existing Policy File
[groups] # Assigns each Hadoop group to its set of roles manager = analyst_role, junior_analyst_role analyst = analyst_role jranalyst = junior_analyst_role customers_admin = customers_admin_role admin = admin_role [roles] # The uris below define a define a landing skid which # the user can use to import or export data from the system. # Since the server runs as the user "hive" files in that directory # must either have the group hive and read/write set or # be world read/write. analyst_role = server=server1->db=analyst1, \ server=server1->db=jranalyst1->table=*->action=select server=server1->uri=hdfs://ha-nn-uri/landing/analyst1 junior_analyst_role = server=server1->db=jranalyst1, \ server=server1->uri=hdfs://ha-nn-uri/landing/jranalyst1 # Implies everything on server1. admin_role = server=server1
The following sections show how you can use the new GRANT statements to assign privileges to roles (and assign roles to groups) to match the sample policy file above.
CREATE ROLE analyst_role; GRANT ALL ON DATABASE analyst1 TO ROLE analyst_role; GRANT SELECT ON DATABASE jranalyst1 TO ROLE analyst_role; GRANT ALL ON URI 'hdfs://ha-nn-uri/landing/analyst1' \ TO ROLE analyst_role;
CREATE ROLE junior_analyst_role; GRANT ALL ON DATABASE jranalyst1 TO ROLE junior_analyst_role; GRANT ALL ON URI 'hdfs://ha-nn-uri/landing/jranalyst1' \ TO ROLE junior_analyst_role;
CREATE ROLE admin_role GRANT ALL ON SERVER server TO ROLE admin_role;
GRANT ROLE admin_role TO GROUP admin; GRANT ROLE analyst_role TO GROUP analyst; GRANT ROLE jranalyst_role TO GROUP jranalyst;
Configuring HiveServer2 for the Sentry Service
<property> <name>hive.security.authorization.task.factory</name> <value>org.apache.sentry.binding.hive.SentryHiveAuthorizationTaskFactoryImpl</value> </property> <property> <name>hive.server2.session.hook</name> <value>org.apache.sentry.binding.hive.HiveAuthzBindingSessionHook</value> </property> <property> <name>hive.sentry.conf.url</name> <value>file:///{{CMF_CONF_DIR}}/sentry-site.xml</value> </property> <property> <name>hive.security.authorization.task.factory</name> <value>org.apache.sentry.binding.hive.SentryHiveAuthorizationTaskFactoryImpl</value> </property>
Configuring the Hive Metastore for the Sentry Service
Configuring Pig and HCatalog for the Sentry Service
Once you have the Sentry service up and running, and Hive has been configured to use the Sentry service, there are some configuration changes you must make to your cluster to allow Pig, MapReduce (using HCatLoader, HCatStorer) and WebHCat queries to access Sentry-secured data stored in Hive.
With HDFS extended ACLs enabled, Cloudera recommends you set the permissions for the Hive warehouse directory, /user/hive/warehouse, to 771 so users other than the owner and group only have execute permissions. Since by default, the /user/hive/warehouse directory is owned by hive:hive, this also restricts requests from any other users at the HDFS level.
- Use HDFS ACLs to define permissions on a specific directory or file of HDFS. This directory/file is generally mapped to a database, table, partition, or a data file.
- Users running these jobs should have the required permissions in Sentry to add new metadata or read metadata from the Hive Metastore Server. For instructions on how to set up the required permissions, see Hive SQL Syntax. You can use HiveServer2's command line interface, Beeline to update the Sentry database with the user privileges.
- A user who is using Pig HCatLoader will require read permissions on a specific table or partition. In such a case, you can GRANT read access to the user in Sentry and set the ACL to read and execute, on the file being accessed.
- A user who is using Pig HCatStorer will require ALL permissions on a specific table. In this case, you GRANT ALL access to the user in Sentry and set the ACL to write and execute, on the table being used.
Configuring the Hive Metastore to Communicate with Sentry
<property> <name>hive.metastore.pre.event.listeners</name> <value>org.apache.sentry.binding.metastore.MetastoreAuthzBinding</value> <description>list of comma seperated listeners for metastore events.</description> </property> <property> <name>hive.metastore.event.listeners</name> <value>org.apache.sentry.binding.metastore.SentryMetastorePostEventListener</value> <description>list of comma seperated listeners for metastore, post events.</description> </property>
Configuring Impala for the Sentry Service
<property> <name>sentry.service.client.server.rpc-port</name> <value>3893</value> </property> <property> <name>sentry.service.client.server.rpc-address</name> <value>hostname</value> </property> <property> <name>sentry.service.client.server.rpc-connection-timeout</name> <value>200000</value> </property> <property> <name>sentry.service.security.mode</name> <value>none</value> </property>
- To enable the Sentry policy service, the following flag should be
set on the catalogd and the impalad.
--sentry_config=<absolute path to sentry service configuration file>
- To enable authorization based on policy server metadata set the
following flag on the impalad.
--server_name=<server name>
- To enable authorization based on a file-based policy set the
following flags on the impalad.
--server_name=<server name> --authorization_policy_file=<path to policy file>
If the --authorization_policy_file flag is set, Impala will use the policy file-based approach. Otherwise, the policy server metadata approach will be used to implement authorization.
- The impala user also needs to be added to list of administrative users of the Sentry Policy Server. For more details, see SENTRY-191.
Appendix: Authorization Privilege Model for Hive and Impala
Privileges can be granted on different objects in the Hive warehouse. Any privilege that can be granted is associated with a level in the object hierarchy. If a privilege is granted on a container object in the hierarchy, the base object automatically inherits it. For instance, if a user has ALL privileges on the database scope, then (s)he has ALL privileges on all of the base objects contained within that scope.
Object Hierarchy in Hive
Server Database Table Partition Columns View Index Function/Routine Lock
Privilege | Object |
---|---|
INSERT | DB, TABLE |
SELECT | DB, TABLE |
ALL | SERVER, TABLE, DB, URI |
Base Object | Granular privileges on object | Container object that contains the base object | Privileges on container object that implies privileges on the base object |
---|---|---|---|
DATABASE | ALL | SERVER | ALL |
TABLE | INSERT | DATABASE | ALL |
TABLE | SELECT | DATABASE | ALL |
VIEW | SELECT | DATABASE | ALL |
Operation | Scope | Privileges | URI | Others |
---|---|---|---|---|
CREATE DATABASE | SERVER | ALL | ||
DROP DATABASE | DATABASE | ALL | ||
CREATE TABLE | DATABASE | ALL | ||
DROP TABLE | TABLE | ALL | ||
CREATE VIEW | DATABASE; SELECT on TABLE | ALL | SELECT on TABLE | |
DROP VIEW | VIEW/TABLE | ALL | ||
CREATE INDEX | TABLE | ALL | ||
DROP INDEX | TABLE | ALL | ||
ALTER TABLE .. ADD COLUMNS | TABLE | ALL | ||
ALTER TABLE .. REPLACE COLUMNS | TABLE | ALL | ||
ALTER TABLE .. CHANGE column | TABLE | ALL | ||
ALTER TABLE .. RENAME | TABLE | ALL | ||
ALTER TABLE .. SET TBLPROPERTIES | TABLE | ALL | ||
ALTER TABLE .. SET FILEFORMAT | TABLE | ALL | ||
ALTER TABLE .. SET LOCATION | TABLE | ALL | URI | |
ALTER TABLE .. ADD PARTITION | TABLE | ALL | ||
ALTER TABLE .. ADD PARTITION location | TABLE | ALL | URI | |
ALTER TABLE .. DROP PARTITION | TABLE | ALL | ||
ALTER TABLE .. PARTITION SET FILEFORMAT | TABLE | ALL | ||
SHOW TBLPROPERTIES | TABLE | SELECT/INSERT | ||
SHOW CREATE TABLE | TABLE | SELECT/INSERT | ||
SHOW PARTITIONs | TABLE | SELECT/INSERT | ||
DESCRIBE TABLE | TABLE | SELECT/INSERT | ||
DESCRIBE TABLE .. PARTITION | TABLE | SELECT/INSERT | ||
LOAD DATA | TABLE | INSERT | URI | |
SELECT | TABLE | SELECT | ||
INSERT OVERWRITE TABLE | TABLE | INSERT | ||
CREATE TABLE .. AS SELECT | DATABASE; SELECT on TABLE | ALL | SELECT on TABLE | |
USE <dbName> | Any | |||
ALTER TABLE .. SET SERDEPROPERTIES | TABLE | ALL | ||
ALTER TABLE .. PARTITION SET SERDEPROPERTIES | TABLE | ALL | ||
Hive-Only Operations | ||||
INSERT OVERWRITE DIRECTORY | TABLE | INSERT | URI | |
Analyze TABLE | TABLE | SELECT + INSERT | ||
IMPORT TABLE | DATABASE | ALL | URI | |
EXPORT TABLE | TABLE | SELECT | URI | |
ALTER TABLE TOUCH | TABLE | ALL | ||
ALTER TABLE TOUCH PARTITION | TABLE | ALL | ||
ALTER TABLE .. CLUSTERED BY SORTED BY | TABLE | ALL | ||
ALTER TABLE .. ENABLE/DISABLE | TABLE | ALL | ||
ALTER TABLE .. PARTITION ENABLE/DISABLE | TABLE | ALL | ||
ALTER TABLE .. PARTITION.. RENAME TO PARTITION | TABLE | ALL | ||
ALTER DATABASE | DATABASE | ALL | ||
DESCRIBE DATABASE | DATABASE | SELECT/INSERT | ||
SHOW COLUMNS | TABLE | SELECT/INSERT | ||
SHOW INDEXES | TABLE | SELECT/INSERT | ||
GRANT PRIVILEGE | Allowed only for Sentry admin users | |||
REVOKE PRIVILEGE | Allowed only for Sentry admin users | |||
SHOW GRANTS | Allowed only for Sentry admin users | |||
ADD JAR | Not Allowed | |||
ADD FILE | Not Allowed | |||
DFS | Not Allowed | |||
Impala-Only Operations | ||||
EXPLAIN | TABLE | SELECT | ||
INVALIDATE METADATA | SERVER | ALL | ||
INVALIDATE METADATA <table name> | TABLE | SELECT/INSERT | ||
REFRESH <table name> | TABLE | SELECT/INSERT | ||
CREATE FUNCTION | SERVER | ALL | ||
DROP FUNCTION | SERVER | ALL | ||
COMPUTE STATS | TABLE | ALL |
<< Sentry Policy File Configuration | Flume Security Configuration >> | |