Chapter 3. Hive Authorization

Authorization determines whether a user has the required permissions to perform select operations, such as creating, reading, and writing data, as well as editing table metadata. Apache Ranger provides centralized authorization for all HDP components, and Hive also provides three authorization models. Administrators should consider the specific use case when choosing an authorization model.

There are two primary use cases for Hive:

  • Table storage layer

    Many HDP components and underlying technologies, such as Apache Hive, Apache HBase, Apache Pig, Apache MapReduce, and Apache Tez rely on Hive as a table storage layer.

  • SQL query engine

    Hadoop administrators, business analysts, and data scientists use Hive to run SQL queries, both from the Hive CLI and remotely through a client connecting to Hive through HiveServer2. These users often configure a data analysis tool, such as Tableau, to connect to Hive through HiveServer2.

    When using a JDBC or ODBC driver, the value of the hive.server2.enable.doAs configuration property in hive.site.xml determines the user account that runs a Hive query. The value assigned to this property depends on the desired Hive authorization model and, in the case of storage-based authorization, on the desired use case.

    In addition to the centralized authorization provided by Apache Ranger, Hive provides three authorization models:

    Authorization model

    Secure?

    Fine-grained authorization (column, row level)

    Privilege management using GRANT/REVOKE statements

    Centralized management GUI

    Apache Ranger

    Secure

    Yes

    Yes

    Yes

    SQL standard-based

    Secure

    Yes, through privileges on table views

    Yes

    No

    Storage-based

    Secure

    No. Authorization at the level of databases, tables and partitions

    No. Table privilege based on HDFS permission

    No

    Hive default

    Not secure. No restriction on which users can run GRANT statements

    Yes

    Yes

    No

    [Note]Note

    Administrators can secure the Hive CLI with Kerberos and by setting permisssions on the HDFS directories where tables reside. The exception to this is storage-based authorization, which does not require managing HDFS permissions and is the most secure authorization model for the Hive CLI.