Using secure access mode

Learn how to use HWC secure access mode that offers fine-grained access control (FGAC) column masking and row filtering to secure managed (ACID), or external Hive table data that you read from Spark. Secure access mode is only available from the Cloudera Runtime 7.1.7 SP1 release onwards.

In this task, you enter the following configuration options when you launch the Spark shell.
  • spark.datasource.hive.warehouse.load.staging.dir=<path of the HDFS staging location>, for example, hdfs://my-cluster:8020/tmp/staging/hwc. The path represents the temporary subdirectories per user, per session that are created under this directory. This is the temporary location where the output is staged for the duration of the session. Make sure to provide the absolute path, or fully qualified name (FQDN), not the relative path of the staging directory.
  • spark.datasource.hive.warehouse.read.mode=secure_access starts using the staging output mode with fine-grained access control (FGAC). No code refactoring is required. You can use hive.sql() in your queries.
  • Your administrator has set up ranger permissions for using secure access mode on Hive tables.
  • Your administrator has granted you permission to the staging location on your file system.
  1. Launch the Spark shell, configuring secure access mode.
    For example, use the Spark session user name livy.
    [root@hwc-secure-1 ~]# sudo -u livy spark-shell --master yarn  \
    --jars /opt/cloudera/parcels/CDH/jars/hive-warehouse-connector-assembly.jar  \
    --conf spark.datasource.hive.warehouse.read.mode=secure_access  \
    --conf spark.datasource.hive.warehouse.load.staging.dir=hdfs://hwc-secure-3.hwc-secure.root.hwx.site:8020/tmp/staging/hwc \
    --conf spark.sql.hive.hiveserver2.jdbc.url="jdbc:hive2://hwc-secure-3.hwc-secure.root.hwx.site:2181/default;retries=5;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2" \
    --conf spark.sql.hive.hiveserver2.jdbc.url.principal=hive/_HOST@ROOT.HWX.SITE                

    Output is:

    Setting default log level to "WARN".
    
    Spark context Web UI available at http://<domain name>:<port>
    Spark context available as 'sc' (master = yarn, app id = application_1637657444149_0024).
    Spark session available as 'spark'.
    Welcome to
          ____              __
         / __/__  ___ _____/ /__
        _\ \/ _ \/ _ `/ __/  '_/
       /___/ .__/\_,_/_/ /_/\_\ 
            /_/
    
    Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 1.8.0_292)
    Type in expressions to have them evaluated.
    Type :help for more information.
  2. Read a Hive table.
    scala> val hive = com.hortonworks.hwc.HiveWarehouseSession.session(spark).build()
    hive: com.hortonworks.spark.sql.hive.llap.HiveWarehouseSessionImpl = com.hortonworks.spark.sql.hive.llap.HiveWarehouseSessionImpl@159b4611
    
    scala> hive.sql("select id, name from test.acidtable limit 3").show
    
    +---+--------------------+
    | id|                name|
    +---+--------------------+
    |  1|   namex x xxxx xxxx|
    |  4|namex xxxx xxxxxx...|
    |  1|   namex x xxxx xxxx|
    +---+--------------------+
    
To illustrate how fine-grained access control column masking and row filtering works, consider the following schema of a managed Hive table, test.acidtable:
+--------+---------+-------+
|col_name|data_type|comment|
+--------+---------+-------+
|      id|      int|       |
|    name|   string|       |
|    col3|      int|       |
|    col4|   string|       |
|    col5|      int|       |
|    col6|   string|       |
+--------+---------+-------+
The Spark session user, 'livy' has access to only the id and name columns and there is a masking policy on the name column.
scala> val hive = com.hortonworks.hwc.HiveWarehouseSession.session(spark).build()
hive: com.hortonworks.spark.sql.hive.llap.HiveWarehouseSessionImpl = com.hortonworks.spark.sql.hive.llap.HiveWarehouseSessionImpl@6e12d8be

scala> hive.sql("select col4, col5 from test.acidtable").show
Output is:
22/01/01 08:31:48 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
java.lang.RuntimeException: Error while compiling statement: FAILED: HiveAccessControlException Permission denied: user [livy] does not have [SELECT] privilege on [test/acidtable/col4,col5]

The exception indicates that the user, 'livy' does not have access to columns — col4 and col5.