Enabling Sentry Authorization for Impala
Authorization determines which users are allowed to access which resources, and what operations they are allowed to perform. In Impala 1.1 and higher, you use Apache Sentry for authorization. Sentry adds a fine-grained authorization framework for Hadoop. By default (when authorization is not enabled), Impala does all read and write operations with the privileges of the impala user, which is suitable for a development/test environment but not for a secure production environment. When authorization is enabled, Impala uses the OS user ID of the user who runs impala-shell or other client program, and associates various privileges with each user.
See the following sections for details about using the Impala authorization features:
The Sentry Privilege Model
Privileges can be granted on different objects in the schema and are associated with a level in the object hierarchy. If a privilege is granted on a parent object in the hierarchy, the child object automatically inherits it. This is the same privilege model as Hive and other database systems.
The objects in the Impala schema hierarchy are:
Server URI Database Table Column
The table-level privileges apply to views as well. Anywhere you specify a table name, you can specify a view name instead.
In CDH 5.5 / Impala 2.3 and higher, you can specify privileges for individual columns.
- ANY denotes the SELECT, INSERT, CREATE, or REFRESH privilege.
- ALL privilege denotes the SELECT, INSERT, CREATE, and REFRESH privileges.
- The owner of an object effectively has the ALL privilege on the object.
- The parent levels of the specified scope are implicitly supported. For example, if a privilege is listed with the TABLE scope, the same privilege granted on DATABASE and SERVER will allow the user to execute that specific SQL statement on TABLE.
|CREATE DATABASE LOCATION||CREATE||SERVER|
|CREATE TABLE LIKE||CREATE||DATABASE|
|SELECT, INSERT, or REFRESH||TABLE|
|CREATE TABLE AS SELECT||CREATE||DATABASE|
|EXPLAIN CREATE TABLE AS SELECT||CREATE||DATABASE|
|CREATE TABLE LOCATION||CREATE||TABLE|
|ALTER DATABASE SET OWNER||ALL WITH GRANT||DATABASE|
|ALTER TABLE SET LOCATION||ALL||TABLE|
|ALTER TABLE RENAME||CREATE||DATABASE|
|ALTER TABLE SET OWNER||ALL WITH GRANT||TABLE|
|ALTER VIEW RENAME||CREATE||DATABASE|
|ALTER VIEW SET OWNER||ALL WITH GRANT||VIEW|
|INVALIDATE METADATA <table>||REFRESH||TABLE|
|COMMENT ON DATABASE||ALL||DATABASE|
|COMMENT ON TABLE||ALL||TABLE|
|COMMENT ON VIEW||ALL||TABLE|
|COMMENT ON COLUMN||ALL||TABLE|
|DESCRIBE DATABASE||SELECT, INSERT, or REFRESH||DATABASE|
|DESCRIBE <table/view>||SELECT, INSERT, or REFRESH||TABLE|
|If the user has the SELECT privilege at the COLUMN level, only the columns the user has access will show.||SELECT||COLUMN|
|SHOW FUNCTIONS||SELECT, INSERT, or REFRESH||DATABASE|
|SHOW PARTITIONS||SELECT, INSERT, or REFRESH||TABLE|
|SHOW TABLE STATS||SELECT, INSERT, or REFRESH||TABLE|
|SHOW COLUMN STATS||SELECT, INSERT, or REFRESH||TABLE|
|SHOW FILES||SELECT, INSERT, or REFRESH||TABLE|
|SHOW CREATE TABLE||SELECT, INSERT, or REFRESH||TABLE|
|SHOW CREATE VIEW||SELECT, INSERT, or REFRESH||TABLE|
|SHOW CREATE FUNCTION||SELECT, INSERT, or REFRESH||DATABASE|
|SHOW RANGE PARTITIONS (Kudu only)||SELECT, INSERT, or REFRESH||TABLE|
|UPDATE (Kudu only)||ALL||TABLE|
|EXPLAIN UPDATE (Kudu only)||ALL||TABLE|
|UPSERT (Kudu only)||ALL||TABLE|
|WITH UPSERT (Kudu only)||ALL||TABLE|
|EXPLAIN UPSERT (Kudu only)||ALL||TABLE|
|DELETE (Kudu only)||ALL||TABLE|
|EXPLAIN DELETE (Kudu only)||ALL||TABLE|
Privileges are managed via the GRANT and REVOKE SQL statements that requires the Sentry service enabled. The Sentry service stores, retrieves, and manipulates privilege information stored inside the Sentry database.
If you change privileges outside of Impala, e.g. adding a user, removing a user, modifying privileges, you must clear the Impala Catalog server cache by running the REFRESH AUTHORIZATION statement. REFRESH AUTHORIZATION is not required if you make the changes to privileges within Impala.
Object Ownership in Sentry
Starting in CDH 5.16 and CDH 6.1, Impala supports the ownership on databases, tables, and views. The CREATE statements implicitly make the user running the statement the owner of the object. An owner has the OWNER privilege if enabled in Sentry. For example, if User A creates a database, foo, via the CREATE DATABASE statement, User A now owns the foo database and is authorized to perform any operation on the foo database.
The OWNER privilege is not a grantable or revokable privilege whereas the ALL privilege is explicitly granted via the GRANT statement.
The object ownership feature is controlled by a Sentry configuration. The OWNER privilege is only granted when the feature is enabled in Sentry. When enabled they get the owner privilege, with or without the GRANT OPTION, which is also controlled by the Sentry configuration.
An ownership can be transferred to another user or role via the ALTER DATABASE, ALTER TABLE, or ALTER VIEW with the SET OWNER clause.
Starting the impalad Daemon with Sentry Authorization Enabled
To run the impalad daemon with authorization enabled, you add the following options to the IMPALA_SERVER_ARGS declaration in the /etc/default/impala configuration file:
- -server_name: Turns on Sentry authorization for Impala. The authorization rules refer to a symbolic server name, and you specify the name to use as the argument to the -server_name option.
- --sentry_config: Specifies the local path to the sentry-site.xml configuration file. This setting is required to enable authorization.
For example, you might adapt your /etc/default/impala configuration to contain lines like the following. To use the Sentry service:
IMPALA_SERVER_ARGS=" \ -server_name=server1 \ ...
The preceding examples set up a symbolic name of server1 to refer to the current instance of Impala. This symbolic name is used in the following ways:
In an environment managed by Cloudera Manager, see Enabling Sentry for Impala in Cloudera Manager for setting up Sentry for Impala in Cloud Manager. The values must be the same for both, so that Impala and Hive can share the privilege rules. Restart the Impala and Hive services after setting or changing this value.
In an environment not managed by Cloudera Manager, you specify this value for the sentry.hive.server property in the sentry-site.xml configuration file for Hive, as well as in the -server_name option for impalad.
Now restart the impalad daemons on all the nodes.
Enabling Sentry for Impala in Cloudera Manager
- Navigate to the Hive cluster.
- In the Configuration tab, select Hive (Service-Wide) under SCOPE and Advanced under CATEGORY.
- In the Sentry Service field, type the Sentry service you specified in the Impala configuration. This is the server name to use when granting server level privileges
- When using Sentry with the Hive Metastore, you can specify the list of users that are allowed to bypass Sentry Authorization in Hive Metastore. Select Security for CATEGORY in the Configuration tab, and specify the users in the Bypass Sentry Authorization Users field. These are usually service users that already ensure all activity has been authorized.
- If in CDH 5, navigate to the Impala cluster, and perform the next two steps to disable the policy file-based authorization.
- In the Configuration tab, select Impala (Service-Wide) under SCOPE and Policy File Based Sentry under CATEGORY.
- Deselect the Enable Sentry Authorization using Policy Files parameter when using the Sentry service. Cloudera Manager throws a validation error in CDH 5 if you attempt to configure the Sentry service and policy file at the same time.
- Restart Impala and Hive.
Using Impala with the Sentry Service (CDH 5.1 or higher only)
When you use the Sentry service, set up privileges through the GRANT and REVOKE statements in either Impala or Hive. Sentry privileges are automatically propagated to both services. Impala added the GRANT and REVOKE statements in CDH 5.2 / Impala 2.0.
If you make a change to privileges in Sentry from outside of Impala, e.g. adding a user, removing a user, modifying privileges, there are two options to propagate the change:
- Use the catalogd flag, --sentry_catalog_polling_frequency_s to specify how often to do a Sentry refresh. The flag is set to 60 seconds by default.
- Run the INVALIDATE METADATA statement to force a Sentry refresh. INVALIDATE METADATA forces a Sentry refresh regardless of the --sentry_catalog_polling_fequency_s flag.
If you make a change to privileges within Impala, INVALIDATE METADATA is not required.
Granting Privileges on URI
URIs represent the file paths you specify as part of statements such as CREATE EXTERNAL TABLE and LOAD DATA. Typically, you specify what look like UNIX paths, but these locations can also be prefixed with hdfs:// to make clear that they are really URIs. To set privileges for a URI, specify the name of a directory, and the privilege applies to all the files in that directory and any directories underneath it.
GRANT ALL ON URI '/tmp';The above statement effectively becomes the following where the default filesystem is HDFS.
GRANT ALL ON URI 'hdfs://localhost:20500/tmp';
GRANT ALL ON URI file:///path/to/dir TO <role> GRANT ALL ON URI hdfs://namenode:port/path/to/dir TO <role>
Examples of Setting up Authorization for Security Scenarios
The following examples show how to set up authorization to deal with various scenarios.
A User with No Privileges
If a user has no privileges at all, that user cannot access any schema objects in the system. The error messages do not disclose the names or existence of objects that the user is not authorized to read.
This is the experience you want a user to have if they somehow log into a system where they are not an authorized Impala user. Or in a real deployment, a user might have no privileges because they are not a member of any of the authorized groups.
Examples of Privileges for Administrative Users
In this example, the SQL statements grant the entire_server role all privileges on both the databases and URIs within the server.
CREATE ROLE entire_server; GRANT ROLE entire_server TO GROUP admin_group; GRANT ALL ON SERVER server1 TO ROLE entire_server;
A User with Privileges for Specific Databases and Tables
If a user has privileges for specific tables in specific databases, the user can access those things but nothing else. They can see the tables and their parent databases in the output of SHOW TABLES and SHOW DATABASES, USE the appropriate databases, and perform the relevant actions (SELECT and/or INSERT) based on the table privileges. To actually create a table requires the ALL privilege at the database level, so you might define separate roles for the user that sets up a schema and other users or applications that perform day-to-day operations on the tables.
CREATE ROLE one_database; GRANT ROLE one_database TO GROUP admin_group; GRANT ALL ON DATABASE db1 TO ROLE one_database; CREATE ROLE instructor; GRANT ROLE instructor TO GROUP trainers; GRANT ALL ON TABLE db1.lesson TO ROLE instructor; # This particular course is all about queries, so the students can SELECT but not INSERT or CREATE/DROP. CREATE ROLE student; GRANT ROLE student TO GROUP visitors; GRANT SELECT ON TABLE db1.training TO ROLE student;
Privileges for Working with External Data Files
When data is being inserted through the LOAD DATA statement, or is referenced from an HDFS location outside the normal Impala database directories, the user also needs appropriate permissions on the URIs corresponding to those HDFS locations.
In this example:
- The external_table role can insert into and query the Impala table, external_table.sample.
- The staging_dir role can specify the HDFS path /user/cloudera/external_data with the LOAD DATA statement. When Impala queries or loads data files, it operates on all the files in that directory, not just a single file, so any Impala LOCATION parameters refer to a directory rather than an individual file.
CREATE ROLE external_table; GRANT ROLE external_table TO GROUP cloudera; GRANT ALL ON TABLE external_table.sample TO ROLE external_table; CREATE ROLE staging_dir; GRANT ROLE staging TO GROUP cloudera; GRANT ALL ON URI 'hdfs://127.0.0.1:8020/user/cloudera/external_data' TO ROLE staging_dir;
Separating Administrator Responsibility from Read and Write Privileges
To create a database, you need the full privilege on that database while day-to-day operations on tables within that database can be performed with lower levels of privilege on specific table. Thus, you might set up separate roles for each database or application: an administrative one that could create or drop the database, and a user-level one that can access only the relevant tables.
In this example, the responsibilities are divided between users in 3 different groups:
- Members of the supergroup group have the training_sysadmin role and so can set up a database named training.
- Members of the cloudera group have the instructor role and so can create, insert into, and query any tables in the training database, but cannot create or drop the database itself.
- Members of the visitor group have the student role and so can query those tables in the training database.
CREATE ROLE training_sysadmin; GRANT ROLE training_sysadmin TO GROUP supergroup; GRANT ALL ON DATABASE training1 TO ROLE training_sysadmin; CREATE ROLE instructor; GRANT ROLE instructor TO GROUP cloudera; GRANT ALL ON TABLE training1.course1 TO ROLE instructor; CREATE ROLE visitor; GRANT ROLE student TO GROUP visitor; GRANT SELECT ON TABLE training1.course1 TO ROLE student;
Setting Up Schema Objects for a Secure Impala Deployment
In your role definitions, you must specify privileges at the level of individual databases and tables, or all databases or all tables within a database. To simplify the structure of these rules, plan ahead of time how to name your schema objects so that data with different authorization requirements is divided into separate databases.
If you are adding security on top of an existing Impala deployment, you can rename tables or even move them between databases using the ALTER TABLE statement.
The DEFAULT Database in a Secure Deployment
Because of the extra emphasis on granular access controls in a secure deployment, you should move any important or sensitive information out of the DEFAULT database into a named database. Sometimes you might need to give privileges on the DEFAULT database for administrative reasons, for example, as a place you can reliably specify with a USE statement when preparing to drop a database.