Chapter 3. Hive Authorization
Authorization determines whether a user has the required permissions to perform select operations, such as creating, reading, and writing data, as well as editing table metadata. Apache Ranger provides centralized authorization for all HDP components, and Hive also provides three authorization models. Administrators should consider the specific use case when choosing an authorization model.
There are two primary use cases for Hive:
Table storage layer
Many HDP components and underlying technologies, such as Apache Hive, Apache HBase, Apache Pig, Apache MapReduce, and Apache Tez rely on Hive as a table storage layer.
SQL query engine
Hadoop administrators, business analysts, and data scientists use Hive to run SQL queries, both from the Hive CLI and remotely through a client connecting to Hive through HiveServer2. These users often configure a data analysis tool, such as Tableau, to connect to Hive through HiveServer2.
When using a JDBC or ODBC driver, the value of the hive.server2.enable.doAs configuration property in hive.site.xml determines the user account that runs a Hive query. The value assigned to this property depends on the desired Hive authorization model and, in the case of storage-based authorization, on the desired use case.
In addition to the centralized authorization provided by Apache Ranger, Hive provides three authorization models:
Authorization model
Secure?
Fine-grained authorization (column, row level)
Privilege management using GRANT/REVOKE statements
Centralized management GUI
Apache Ranger
Secure
Yes
Yes
Yes
SQL standard-based
Secure
Yes, through privileges on table views
Yes
No
Storage-based
Secure
No. Authorization at the level of databases, tables and partitions
No. Table privilege based on HDFS permission
No
Hive default
Not secure. No restriction on which users can run GRANT statements
Yes
Yes
No
Note Administrators can secure the Hive CLI with Kerberos and by setting permisssions on the HDFS directories where tables reside. The exception to this is storage-based authorization, which does not require managing HDFS permissions and is the most secure authorization model for the Hive CLI.