Apache Ranger permissions for Impala metadata extraction
The Cloudera Octopai Impala extractor requires specific Apache Ranger permissions to extract database and table metadata.
Overview
The Cloudera Octopai Impala extractor requires specific Apache Ranger permissions to extract metadata from Impala databases and tables, including the minimum required permissions based on the extractor’s operational scope.
To install the Impala ODBC driver and configure the DSN, see Impala.
Recommended permissions
Grant the extractor principal the Impala REFRESH privilege on the required databases and tables in Apache Ranger. The REFRESH privilege is the least-privilege option that allows the extractor to run SHOW DATABASES, SHOW TABLES, DESCRIBE FORMATTED, SHOW CREATE TABLE, and SHOW CREATE VIEW without granting data-read access.
Extractor Scope
The Cloudera Octopai Impala extractor performs metadata-only SQL
operations against Impala and Hive objects. It does not issue SELECT
statements against table data.
The extractor runs the following SQL statements:
SHOW DATABASESlists all databases.SHOW TABLES IN [***DATABASE NAME***]lists tables within a database.DESCRIBE FORMATTED [***DATABASE NAME***].[***TABLE NAME***]retrieves table metadata.SHOW CREATE TABLE [***DATABASE NAME***].[***TABLE NAME***]retrieves table DDL and location.SHOW CREATE VIEW [***DATABASE NAME***].[***VIEW NAME***]retrieves view definitions.
Permission Mapping
The following table maps extractor actions to the minimum required Impala privileges and recommended Apache Ranger grants.
| Extractor action | Impala statement | Minimum privilege | Recommended grant |
|---|---|---|---|
| List databases | SHOW DATABASES |
Any privilege on visible objects. The SHOW statement only returns objects the user can view. |
REFRESH |
| List tables in a database | SHOW TABLES IN [***DATABASE NAME***] |
Any privilege on visible objects. Output is filtered by authorization. | REFRESH |
| Retrieve table metadata | DESCRIBE FORMATTED [***DATABASE NAME***].[***TABLE NAME***] |
SELECT, INSERT, or REFRESH on the table or view |
REFRESH |
| Retrieve DDL and location | SHOW CREATE TABLE [***DATABASE NAME***].[***TABLE NAME***] |
SELECT, INSERT, or REFRESH on the table or view |
REFRESH |
| Retrieve view definition | SHOW CREATE VIEW [***DATABASE NAME***].[***VIEW NAME***] |
SELECT, INSERT, or REFRESH on the view |
REFRESH |
Least-privilege Best Practices
Use the following guidance when you configure Ranger permissions for the Impala extractor:
- Use REFRESH as the primary privilege
- Based on the Impala authorization matrix,
REFRESHis sufficient forDESCRIBE FORMATTED,SHOW CREATE TABLE, andSHOW CREATE VIEWstatements, and it also satisfies the broader visibility requirement forSHOWstatements. - Grant at database scope
- When the extractor must inventory all tables in a database, grant the privilege at the database scope. This approach is simpler operationally and avoids partial visibility caused by table-by-table omissions.
- Avoid granting SELECT
- Do not grant
SELECTunlessREFRESHdoes not work in your specific cluster policy model or product version. TheSELECTprivilege exposes data-read capability that the extractor does not require.
Ranger Policy Configuration Examples
The following examples show how to configure Apache Ranger policies for different metadata extraction scenarios.
Use this configuration when the extractor should access only specific databases.
| Ranger resource | Value | Permission |
|---|---|---|
| Database | finance / sales / hr (example) |
REFRESH |
| Table | * |
REFRESH |
| Column | * or not applicable |
None required |
Use this configuration when the extractor should access all databases in the cluster.
| Ranger resource | Value | Permission |
|---|---|---|
| Database | * |
REFRESH |
| Table | * |
REFRESH |
| Column | * or not applicable |
None required |
Important Considerations
- Security-filtered SHOW statements
- If a database or table does not appear in the extraction results, the extractor principal likely does not have sufficient privileges on that object. The
SHOWstatements automatically filter output based on the user's authorization level. - Column-level grants
- Avoid using column-level grants for the extractor principal. Column-level grants can lead to partial metadata visibility, while the extractor expects full table definitions.
- LOCATION extraction
- The extractor retrieves the
LOCATIONproperty by parsingSHOW CREATE TABLEoutput. Because it only reads DDL text, no HDFS read permission is required or implied. - System databases
- The system databases
_impala_builtins,sys, andinformation_schemaare automatically excluded by the extractor and do not require explicit permissions.
Validating the Ranger Policy
After applying the Apache Ranger policy, validate that the extractor principal has proper access by running the following test commands.
| Test command | Expected result |
|---|---|
SHOW DATABASES; |
Returns all authorized databases |
SHOW TABLES IN `[***DATABASE NAME***]`; |
Returns all visible tables in the specified database |
DESCRIBE FORMATTED `[***DATABASE NAME***]`.`[***TABLE NAME***]`; |
Returns complete table metadata including schema, partitions, and properties |
SHOW CREATE TABLE `[***DATABASE NAME***]`.`[***TABLE NAME***]`; |
Returns DDL statement including LOCATION when applicable |
SHOW CREATE VIEW `[***DATABASE NAME***]`.`[***VIEW NAME***]`; |
Returns complete view definition |
Summary
Configure an Apache Ranger policy that grants the Impala extractor principal the REFRESH privilege on the required database and table scope. This represents the lowest practical permission set for metadata extraction operations.
Grant the SELECT privilege only if cluster-specific behavior or policy requirements show that REFRESH alone is insufficient.
