Using SECURE_ACCESS mode for fine-grained access control in Hive Warehouse Connector
The SECURE_ACCESS mode in Hive writes temporary external tables to a secure staging location. The staging location can be securely accessed using Spark's native readers, and during the operations, Fine-Grained Access Control (FGAC) and Ranger policies are enforced.
The SECURE_ACCESS mode in Hive uses the Create Table As Select (CTAS) operation to write temporary external tables to a secure staging location. CTAS allows the creation of a new table and populates it with query results in a single step.
This mode offers better performance for shuffle-intensive queries compared to Hive Warehouse Connector (HWC) Low Latency Analytical Processing (LLAP) mode, as it avoids the additional overhead of serialization and deserialization in LLAP. However, intermediate data is generated every time the Spark job runs, even if the underlying data remains unchanged.
Additional storage requirements depend on the number of concurrent Spark jobs running. Importantly, no code refactoring is needed—you can enable the SECURE_ACCESS read mode by updating the configuration.
Set up a staging directory:
- Choose a staging directory (for example, hdfs://.../tmp/staging/hwc or s3a://[***S3-BUCKET-NAME***]/tmp/staging/hwc).
- Grant
-wtaccess on the parent directory to allow users to traverse and create subdirectories with1700permissions. - Enable the sticky bit to prevent other users from deleting or renaming
directories.
sudo -u hdfs hdfs dfs -chmod 1703 /tmp/stagingsudo -u hdfs hdfs dfs -chmod 1703 /tmp/staging/hwcsudo -u hdfs hdfs dfs -chmod 1703 /tmp/staging/hwc/*
For more information, see Setting up secure access mode in Cloudera Data Hub.
