Securing Apache Hive
Authorization determines whether a user has the required permissions to perform select operations, such as creating, reading, and writing data, as well as editing table metadata. Apache Ranger provides centralized authorization for all HDP components, and Hive also provides three authorization models. Administrators should consider the specific use case when choosing an authorization model.
There are two primary use cases for Hive:
Table storage layer
Many HDP components and underlying technologies, such as Apache Hive, Apache HBase, Apache Pig, Apache MapReduce, and Apache Tez rely on Hive as a table storage layer.
SQL query engine
Hadoop administrators, business analysts, and data scientists use Hive to run SQL queries, both from the Hive CLI and remotely through a client connecting to Hive through HiveServer2. These users often configure a data analysis tool, such as Tableau, to connect to Hive through HiveServer2.
When using a JDBC or ODBC driver, the value of the hive.server2.enable.doAs configuration property in the
hive.site.xml
file determines the user account that runs a Hive query. The value assigned to this property depends on the desired Hive authorization model and, in the case of storage-based authorization, on the desired use case.Hive LLAP and the doAs Flag
The architecture of Hive LLAP shares and caches data across many users in much the same way as other MPP or database technologies do. As a result, older file-based security controls do not work with this architecture and doAs is not supported with Hive LLAP. You must use Apache Ranger security policies with a doAs=false setting to achieve secure access via Hive LLAP, while restricting underlying file access so that Hive and other privileged users can access it but unprivileged users cannot.
Apache Ranger and Other Authorization Models
In addition to the centralized authorization provided by Apache Ranger, Hive can use three other authorization models:
Authorization model
Secure?
Fine-grained authorization (column, row level)
Privilege management using GRANT/REVOKE statements
Centralized management GUI
Apache Ranger
Secure
Yes
Yes
Yes
SQL standard-based
Secure
Yes, through privileges on table views
Yes
No
Storage-based
Secure
No. Authorization at the level of databases, tables, and partitions
No. Table privilege based on HDFS permission
No
Hive default
Not secure. No restriction on which users can run GRANT statements
Yes
Yes
No
Note Administrators can secure the Hive CLI with Kerberos and by setting permisssions on the HDFS directories where tables reside. The exception to this is storage-based authorization, which does not require managing HDFS permissions and is the most secure authorization model for the Hive CLI.