Data Access
Also available as:
PDF
loading table of contents...

Securing Apache Hive

Authorization determines whether a user has the required permissions to perform select operations, such as creating, reading, and writing data, as well as editing table metadata. Apache Ranger provides centralized authorization for all HDP components, and Hive also provides three authorization models. Administrators should consider the specific use case when choosing an authorization model.

There are two primary use cases for Hive:

  • Table storage layer

    Many HDP components and underlying technologies, such as Apache Hive, Apache HBase, Apache Pig, Apache MapReduce, and Apache Tez rely on Hive as a table storage layer.

  • SQL query engine

    Hadoop administrators, business analysts, and data scientists use Hive to run SQL queries, both from the Hive CLI and remotely through a client connecting to Hive through HiveServer2. These users often configure a data analysis tool, such as Tableau, to connect to Hive through HiveServer2.

    When using a JDBC or ODBC driver, the value of the hive.server2.enable.doAs configuration property in the hive.site.xml file determines the user account that runs a Hive query. The value assigned to this property depends on the desired Hive authorization model and, in the case of storage-based authorization, on the desired use case.

    Hive LLAP and the doAs Flag

    The architecture of Hive LLAP shares and caches data across many users in much the same way as other MPP or database technologies do. As a result, older file-based security controls do not work with this architecture and doAs is not supported with Hive LLAP. You must use Apache Ranger security policies with a doAs=false setting to achieve secure access via Hive LLAP, while restricting underlying file access so that Hive can access it but unprivileged users cannot.

    Apache Ranger and Other Authorization Models

    In addition to the centralized authorization provided by Apache Ranger, Hive can use three other authorization models:

    Authorization model

    Secure?

    Fine-grained authorization (column, row level)

    Privilege management using GRANT/REVOKE statements

    Centralized management GUI

    Apache Ranger

    Secure

    Yes

    Yes

    Yes

    SQL standard-based

    Secure

    Yes, through privileges on table views

    Yes

    No

    Storage-based

    Secure

    No. Authorization at the level of databases, tables, and partitions

    No. Table privilege based on HDFS permission

    No

    Hive default

    Not secure. No restriction on which users can run GRANT statements

    Yes

    Yes

    No

    [Note]Note

    Administrators can secure the Hive CLI with Kerberos and by setting permisssions on the HDFS directories where tables reside. The exception to this is storage-based authorization, which does not require managing HDFS permissions and is the most secure authorization model for the Hive CLI.