Spark and Hive permissions on Azure cloud cluster
This topic describes example scenarios about how Ranger authorizes queries coming from Spark to access Hive table.
- Environment details
- Cloudera Data Platform: Azure cloud cluster with CDP 7.2.18.500
Connecting to Spark and running Hive queries
- Connecting to Spark shell
- To connect to the Spark shell, run the following
command:
spark3-shell - Running Hive queries in Spark shell
- To run Hive queries, you need to call the HiveContext
class:
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) - Executing a query
-
sqlContext.sql("select * from customer2").show()
Prerequisites for Ranger Hive security
ranger.plugin.hive.urlauth.filesystem.schemes = hdfs:,file:,wasb:,adl:,abfs:Initial setup with Hive user
CREATE EXTERNAL TABLE customer2 (c1 int, c2 STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
insert into customer2 values(1,'hivetest1');
select * from customer2;
create database vehicle;
create external table vehicle.cars(car_id int, car_name string, car_color string, car_price int);
insert into vehicle.cars(car_id, car_name, car_color, car_price) VALUES (1,'car1','color1',100000), (2,'car2','color2',200000), (3,'car3','color3',300000), (4,'car4','color4',400000);
select * from vehicle.cars;Use cases for custom user permissions
- Usecase 1: Running a Select query with a custom user
-
- Objective: To run a select query using custom user
hrt_qahaving policy on a table. - Workflow:
- Created initial Hive policy:
-
Resource:
-
Hive Database = default
-
Hive Table = customer2
-
Hive Column = *
-
-
User:
- hrt_qa = select
-
- Attempted command (as
hrt_qaafterkinitin Spark shell):sqlContext.sql("select * from customer2").show() - Observed error: Denied exception for abfs path. For example,
SASTokenProviderException: Failed to acquire a SAS token for get-status on /warehouse/tablespace/external/hive/customer2 due to org.apache.hadoop.security.AccessControlException: Permission deniedThis indicates a need for an ADLS policy.
- ADLS policy creation:
-
Storage account: sdxqeweekly
-
Storage account container: vb-18df-data
-
Relative path: /warehouse/tablespace/external/hive
-
User: hrt_qa = all
-
- Re-attempted command (as
hrt_qaafterkinit):sqlContext.sql("select * from customer2").show()
- Created initial Hive policy:
- Result: The command is successful, and table records are visible.
- Objective: To run a select query using custom user
- Usecase 2: Creating a table in the database with a custom user
-
- Objective: To allow custom user
hrt_qato create a table in thevehicledatabase. - Workflow:
- Initial Hive policy creation:
-
Resource:
-
Hive Database = vehicle
-
Hive Table = *
-
Hive Column = *
-
-
User:
- hrt_qa = create
-
- Attempted command (as
hrt_qaafterkinitin a Spark shell):sqlContext.sql("create table vehicle.cars1(car_id int, car_name string, car_color string, car_price int)").show() - Observed error: Denied exception for Azure path. For
example,
SASTokenProviderException: Failed to acquire a SAS token for get-status on /warehouse/tablespace/external/hive/vehicle.db/cars1 due to org.apache.hadoop.security.AccessControlException: Permission denied.This indicates a need for an ADLS policy.
- ADLS policy creation:
-
Storage Account: sdxqeweekly
-
Storage Account Container: vb-18df-data
-
Relative Path: /warehouse/tablespace/external/hive
-
User: hrt_qa = all
-
- Re-attempted command (as hrt_qa after kinit in a Spark
shell):
sqlContext.sql("create table vehicle.cars1(car_id int, car_name string, car_color string, car_price int)").show()
- Initial Hive policy creation:
- Result: Command was successful, and the table was created.
- Objective: To allow custom user
- Usecase 3: Altering a table in the database with a custom user
-
- Objective: To allow a custom user
hrt_qato alter thecarstable in thevehicledatabase. - Workflow:
- Initial Hive policy creation:
-
Resource:
-
Hive Database = vehicle
-
Hive Table = cars
-
Hive Column = *
-
-
User: hrt_qa = alter
-
- Attempted command (as
hrt_qaafterkinitin a Spark shell):sqlContext.sql("alter table vehicle.cars rename to vehicle.cars_a").show() - Observed error: Permission denied: user [hrt_qa] does not have [SELECT] privilege on
[vehicle/cars].
This indicates that you need to give SELECT permission to the user.
- Updated Hive policy (adding select):
-
Resource:
-
Hive Database = vehicle
-
Hive Table = cars
-
Hive Column = *
-
-
User: hrt_qa = alter, select
-
- Re-attempted command (as
hrt_qaafterkinitin a Spark shell):sqlContext.sql("alter table vehicle.cars rename to vehicle.cars_a").show() - Observed Error: Unable to alter table. Permission denied: user [hrt_qa] does not have
[ALTER] privilege on [vehicle/cars_a].
This indicates that the new table name also needs ALTER permission.
- Further updated Hive policy (adding new table name):
-
Resource:
-
Hive Database = vehicle
-
Hive Table = cars, cars_a
-
Hive Column = *
-
-
User: hrt_qa = alter, select
-
- Re-attempted command (as
hrt_qaafterkinitin a Spark shell):sqlContext.sql("alter table vehicle.cars rename to vehicle.cars_a").show() - Deny log generated: Command was successful, and the table was renamed. However, Ranger
UI audit access logs showed a deny log for the Azure directory.
This indicates that you need to perform the alter operation with ADLS policy to avoid the above denied log.
- Addressing Azure directory deny log (with ADLS policy):
-
Hive policy:
-
Resource:
-
Hive Database = vehicle
-
Hive Table = cars_a, cars_b
-
Hive Column = *
-
-
User: hrt_qa = alter, select
-
-
ADLS policy:
-
Storage Account: sdxqeweekly
-
Storage Account Container: vb-18df-data
-
Relative Path: /warehouse/tablespace/external/hive
-
User: hrt_qa = all
-
-
- Final command (as
hrt_qaafterkinitin a Spark shell):sqlContext.sql("alter table vehicle.cars_a rename to vehicle.cars_b").show()
- Initial Hive policy creation:
- Result: Command was successful, the table got renamed successfully, and no deny logs for the Azure directory were seen.
- Objective: To allow a custom user
- Usecase 4: Dropping a table with a custom user
-
- Objective: To allow custom user
hrt_qato run a drop table query. - Workflow:
- Initial setup (as Hive user):
- Create database:
create database sports; - Create external
table:
create external table sports.cricket(id int, name string); - Insert
data:
insert into sports.cricket values (11, 'name1'); - Verify data:
select * from sports.cricket;
- Create database:
- Initial Hive policy:
- Resource:
-
Hive Database = sports
-
Hive Table = cricket
-
Hive Column = *
-
- User: hrt_qa = drop
- Resource:
- Attempt to drop table (as
hrt_qaafterkinitin a Spark shell):sqlContext.sql("drop table sports.cricket").show() - Observed Error:The command failed due to a missing SELECT permission:
org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table cricket. Permission denied: user [hrt_qa] does not have [SELECT] privilege on [sports/cricket]This indicates that you need to give SELECT permission to the user.
- Updated Hive policy:
-
Resource:
-
Hive Database = sports
-
Hive Table = cricket
-
Hive Column = *
-
-
User: hrt_qa = drop, select
-
- Re-attempted command to drop table (as
hrt_qaafterkinitin a Spark shell):kinit as hrt_qa sqlContext.sql("drop table sports.cricket").show()
- Initial setup (as Hive user):
- Result: The command executed successfully, and the table was dropped.
- Objective: To allow custom user
- Usecase 5: Inserting values into a table with a custom user
-
- Objective: Inserting values into a table with custom user
hrt_qa. - Workflow:
- Initial setup (as Hive user):
- Create
database:
create database sports; - Create external
table:
create external table sports.football(id int, name string); - Verify
data:
select * from sports.football;
- Create
database:
- Initial Hive policy:
-
Resource:
-
Hive Database = sports
-
Hive Table = football
-
Hive Column = *
-
-
User: hrt_qa = update
-
- Attempt to insert data (as
hrt_qaafterkinitin a Spark shell):sqlContext.sql("insert into sports.football values (11, 'name1')").show() - Observed error (after first attempt):The command failed due to a missing SELECT permission:
Caused by: org.apache.hadoop.hive.metastore.api.MetaException: Permission denied: user [hrt_qa] does not have [SELECT] privilege on [sports/football]This indicates that the custom user needs SELECT permission for insertion.
- Updated Hive policy:
-
Resource:
-
Hive Database = sports
-
Hive Table = football
-
Hive Column = *
-
-
User: hrt_qa = update, select
-
- Re-attempted command to insert data (as
hrt_qaafterkinitin a Spark shell):sqlContext.sql("insert into sports.football values (11, 'name1')").show() - Observed error (after second attempt):The command failed due to an Azure directory permission issue:
org.apache.spark.SparkRuntimeException: Cannot create staging directory: 'Some(abfs://vb-18df-data@sdxqeweekly.dfs.core.windows.net/warehouse/tablespace/external/hive/sports.db/football/.hive-staging_hive_ 2024-12-18 _10-12-47_030_6416593550262289528-1)': Failed to acquire a SAS token for create-directory on /warehouse/tablespace/external/hive/sports.db/football/.hive-staging_hive_ 2024-12-18 _10-12-47_030_6416593550262289528-1 due to org.apache.hadoop.security.AccessControlException: Permission denied.This indicates that an ADLS (Azure Data Lake Storage) policy is required for the custom user to manage the staging directory.
- Added ADLS policy:
-
Existing Hive Policy:
-
Resource:
- Hive Database = sports
-
Hive Table = football
-
Hive Column = *
-
User: hrt_qa = update, select
-
-
ADLS Policy:
-
Storage Account = sdxqeweekly
-
Storage Account Container = vb-18df-data
-
Relative Path = /warehouse/tablespace/external/hive
-
User: hrt_qa = all
-
-
- Re-attempt command to insert data (as
hrt_qaafterkinitin a Spark shell after adding ADLS policy):sqlContext.sql("insert into sports.football values (11, 'name1')").show() - Observed error (after third attempt):The command failed due to a missing ALTER permission, even though the data was observed to be inserted:
This indicates that the custom user also requires ALTER permission.org.apache.spark.sql.AnalysisException: Unable to alter table. Permission denied: user [hrt_qa] does not have [ALTER] privilege on [sports/football] - Final updated Hive policy:
-
Existing Hive policy:
-
Resource:
- Hive Database = sports
-
Hive Table = football
-
Hive Column = *
-
User: hrt_qa = update, select, alter
-
-
ADLS policy:
-
Storage Account = sdxqeweekly
-
Storage Account Container = vb-18df-data
-
Relative Path = /warehouse/tablespace/external/hive
-
User: hrt_qa = all
-
-
- Final re-attempted command to insert data (as
hrt_qaafterkinitin a Spark shell):kinit as hrt_qa sqlContext.sql("insert into sports.football values (11, 'name1')").show()
- Initial setup (as Hive user):
- Result: The command executed successfully, and two entries were observed in the table.
- Objective: Inserting values into a table with custom user
