Secure access mode introduction

If you want fine-grained access control (FGAC) column masking and row filtering to secure managed (ACID), or even external, Hive table data that you read from Spark, you need to learn about Hive Warehouse Connector (HWC) secure access mode. If you have large workloads, low-latency requirements, and require fine-grained access control, secure access mode is recommended over the Direct Reader mode.

As an administrator, you set up Ranger FGAC, consisting of column masking and row filtering, to secure the data. You select an HDFS location for staging. When users launch Spark, configuring secure access mode, HWC creates external tables on your designated staging location. HWC uses CREATE TABLE AS SELECT (CTAS) to create the tables. No code refactoring is necessary. Ranger policies are applied and FGAC is enforced during the CTAS operation. Users read external tables in the secure staging location.

Requirements

  • As administrator, you set up Ranger policies.
  • As administrator, you grant users access to the staging location on the file system.
  • As a user, you launch Spark and query Hive tables in secure access mode.

Considerations

  • Intermediate data is generated every time the spark job runs even if it is running on the same underlying data.
  • Intermediate data is automatically cleared when the Spark session ends.
  • Additional storage requirements depend on concurrent spark jobs running.