Overview of Apache Hive Security in CDH

Securing Hive involves configuring or enabling:

  • Authentication for Hive metastore, HiveServer2, and all Hive clients with your deployment of LDAP and Kerberos for your cluster.

    See Hive Authentication, HiveServer2 Security Configuration, and Using Hive to Run Queries on a Secure HBase Server for details.

  • Authorization for HiveServer2 using role-based, fine-grained authorization that is implemented with Apache Sentry policies. You must configure HiveServer2 authentication before you configure authorization because Apache Sentry depends on an underlying authentication framework to reliably identify the requesting user.

    See Authorization With Apache Sentry, User to Group Mapping, and Authorization Privilege Model for Hive and Impala for details. Configure Sentry permissions using GRANT and REVOKE statements using the HiveServer2 client, the Beeline CLI. See Hive SQL Syntax for Use with Sentry for details.
  • Encryption to secure the network connection between HiveServer2 and Hive clients.

    In CDH 5.5 and later, encryption between HiveServer2 and its clients has been decoupled from Kerberos authentication. (Prior to CDH 5.5, SASL QOP encryption for JDBC client drivers required connections authenticated by Kerberos.) De-coupling the authentication process from the transport-layer encryption process means that HiveServer2 can support two different approaches to encryption between the service and its clients (Beeline, JDBC/ODBC) regardless of whether Kerberos is being used for authentication, specifically:

    Unlike TLS/SSL, SASL QOP encryption does not require certificates and is aimed at protecting core Hadoop RPC communications. However, SASL QOP may have performance issues when handling large amounts of data, so depending on your usage patterns, TLS/SSL may be a better choice. See the following topics for details about configuring HiveServer2 services and clients for TLS/SSL and SASL QOP encryption.

    See Configuring Encrypted Communication Between HiveServer2 and Client Drivers for details.

Securing the default database

Hive contains a default database default. Everyone can access the database if you set sentry.hive.restrict.defaultDB=false in sentry-site.xml. You cannot use the default database and perform basic operations, such as listing database names, if this property is set to true.

Accessing the information_schema

To query the information_schma, sentry.hive.restrict.defaultDB must be set to false in sentry-site.xml.