Apache Ranger permissions for Impala metadata extraction

The Cloudera Octopai Impala extractor requires specific Apache Ranger permissions to extract database and table metadata.

Overview

The Cloudera Octopai Impala extractor requires specific Apache Ranger permissions to extract metadata from Impala databases and tables, including the minimum required permissions based on the extractor’s operational scope.

To install the Impala ODBC driver and configure the DSN, see Impala.

Recommended permissions

Grant the extractor principal the Impala REFRESH privilege on the required databases and tables in Apache Ranger. The REFRESH privilege is the least-privilege option that allows the extractor to run SHOW DATABASES, SHOW TABLES, DESCRIBE FORMATTED, SHOW CREATE TABLE, and SHOW CREATE VIEW without granting data-read access.

Extractor Scope

The Cloudera Octopai Impala extractor performs metadata-only SQL operations against Impala and Hive objects. It does not issue SELECT statements against table data.

The extractor runs the following SQL statements:

  • SHOW DATABASES lists all databases.

  • SHOW TABLES IN [***DATABASE NAME***] lists tables within a database.

  • DESCRIBE FORMATTED [***DATABASE NAME***].[***TABLE NAME***] retrieves table metadata.

  • SHOW CREATE TABLE [***DATABASE NAME***].[***TABLE NAME***] retrieves table DDL and location.

  • SHOW CREATE VIEW [***DATABASE NAME***].[***VIEW NAME***] retrieves view definitions.

Permission Mapping

The following table maps extractor actions to the minimum required Impala privileges and recommended Apache Ranger grants.

Table 1. Extractor actions, statements, and grants
Extractor action Impala statement Minimum privilege Recommended grant
List databases SHOW DATABASES Any privilege on visible objects. The SHOW statement only returns objects the user can view. REFRESH
List tables in a database SHOW TABLES IN [***DATABASE NAME***] Any privilege on visible objects. Output is filtered by authorization. REFRESH
Retrieve table metadata DESCRIBE FORMATTED [***DATABASE NAME***].[***TABLE NAME***] SELECT, INSERT, or REFRESH on the table or view REFRESH
Retrieve DDL and location SHOW CREATE TABLE [***DATABASE NAME***].[***TABLE NAME***] SELECT, INSERT, or REFRESH on the table or view REFRESH
Retrieve view definition SHOW CREATE VIEW [***DATABASE NAME***].[***VIEW NAME***] SELECT, INSERT, or REFRESH on the view REFRESH

Least-privilege Best Practices

Use the following guidance when you configure Ranger permissions for the Impala extractor:

Use REFRESH as the primary privilege
Based on the Impala authorization matrix, REFRESH is sufficient for DESCRIBE FORMATTED, SHOW CREATE TABLE, and SHOW CREATE VIEW statements, and it also satisfies the broader visibility requirement for SHOW statements.
Grant at database scope
When the extractor must inventory all tables in a database, grant the privilege at the database scope. This approach is simpler operationally and avoids partial visibility caused by table-by-table omissions.
Avoid granting SELECT
Do not grant SELECT unless REFRESH does not work in your specific cluster policy model or product version. The SELECT privilege exposes data-read capability that the extractor does not require.

Ranger Policy Configuration Examples

The following examples show how to configure Apache Ranger policies for different metadata extraction scenarios.

Use this configuration when the extractor should access only specific databases.

Table 2. Example A: Scan selected databases
Ranger resource Value Permission
Database finance / sales / hr (example) REFRESH
Table * REFRESH
Column * or not applicable None required

Use this configuration when the extractor should access all databases in the cluster.

Table 3. Example B: Scan all databases
Ranger resource Value Permission
Database * REFRESH
Table * REFRESH
Column * or not applicable None required

Important Considerations

Security-filtered SHOW statements
If a database or table does not appear in the extraction results, the extractor principal likely does not have sufficient privileges on that object. The SHOW statements automatically filter output based on the user's authorization level.
Column-level grants
Avoid using column-level grants for the extractor principal. Column-level grants can lead to partial metadata visibility, while the extractor expects full table definitions.
LOCATION extraction
The extractor retrieves the LOCATION property by parsing SHOW CREATE TABLE output. Because it only reads DDL text, no HDFS read permission is required or implied.
System databases
The system databases _impala_builtins, sys, and information_schema are automatically excluded by the extractor and do not require explicit permissions.

Validating the Ranger Policy

After applying the Apache Ranger policy, validate that the extractor principal has proper access by running the following test commands.

Table 4. Commands to validate Ranger access
Test command Expected result
SHOW DATABASES; Returns all authorized databases
SHOW TABLES IN `[***DATABASE NAME***]`; Returns all visible tables in the specified database
DESCRIBE FORMATTED `[***DATABASE NAME***]`.`[***TABLE NAME***]`; Returns complete table metadata including schema, partitions, and properties
SHOW CREATE TABLE `[***DATABASE NAME***]`.`[***TABLE NAME***]`; Returns DDL statement including LOCATION when applicable
SHOW CREATE VIEW `[***DATABASE NAME***]`.`[***VIEW NAME***]`; Returns complete view definition

Summary

Configure an Apache Ranger policy that grants the Impala extractor principal the REFRESH privilege on the required database and table scope. This represents the lowest practical permission set for metadata extraction operations.

Grant the SELECT privilege only if cluster-specific behavior or policy requirements show that REFRESH alone is insufficient.