What's New in Apache Hive

Learn about the new features of Hive in Cloudera Runtime 7.3.1.

Trusting HTTP headers for authentication

When HTTP headers are authenticated via Knox, they can be trusted to establish a session without re-authenticating at HiveServer2. If a trusted header is present in the HTTP request, password-based authentication is skipped, and the client name is extracted directly from the Authorization header.

This change simplifies the authentication process, eliminating the need for repeated authorization since the trusted header confirms the user has already been authenticated by Knox.

Apache Jira : HIVE-25349

Multi-authentication support for SAML and LDAP in Hive

You can now connect to Hive using both SAML and LDAP authentication modes simultaneously when the transport mode is set to HTTP. This new feature allows you to use multiple authentication mechanisms concurrently, making it easier to connect without adjusting the authentication settings for different use cases. The configuration hive.server2.authentication now accepts comma-separated values for both SAML and LDAP.

Apache Jira : HIVE-25875

Improved query plans using constraint information

Hive now uses constraint information, such as not null, when creating RexNodes, leading to more optimized query plans. This update enables Hive to generate simpler, more efficient query plans by avoiding unnecessary joins when not null constraints are applied.

Apache Jira : HIVE-26043

Print DAG ID to console

You can now view the DAG ID directly in the console when executing queries. This makes it easier to track and debug query executions by providing immediate visibility of the DAG ID.

Apache Jira : HIVE-25176

Increase default value of PartitionManagementTask frequency

The default frequency for metastore.partition.management.task.frequency has been increased from five minutes to six hours. This change ensures better performance in production environments with a lot of databases and tables, allowing enough time for the task to scan all tables and partitions.

Apache Jira : HIVE-27011

Support for both LDAP and kerberos authentication in HiveServer2

HiveServer2 now supports both LDAP and kerberos authentication simultaneously. The configuration hive.server2.authentication can accept comma-separated values for both Kerberos and LDAP even in binary mode.

Apache Jira: HIVE-27352

Thrift-over-HTTP support for Hive Metastore client

Hive Metastore client can now connect through Thrift-over-HTTP, enabling access through Knox.

Apache Jira: HIVE-21456

Data connector authorization on the Hive Metastore server side

You can now authorize Data Definition Language (DDL) operations for connectors on the Hive Metastore server side. This enhancement improves security by ensuring only authorized users can perform these operations.

Apache Jira: HIVE-26248

Setting the user for compaction tasks

This update introduces a new configuration that allows you to specify a user for running compaction tasks, instead of relying on the table directory owner by default. The configuration provides flexibility by enabling you to assign a specific user for compaction operations, including file listing in the Initiator and Cleaner.

This is useful when you need to run compaction as a specific user, giving administrators more control over permissions and task management. Admins can now optionally configure the user that will handle compaction tasks.

Apache Jira: HIVE-24191

Support for HDFS snapshots

You can now utilize HDFS snapshots to enhance external table replication. With the addition of DistCp diff using snapshots, replication is streamlined to include only modified entries. This eliminates the need to list all files and directories, significantly reducing the effort and time required for data copying.

Apache Jira: HIVE-24852

Ability to create tables on individual files directly

You can now create tables directly on individual files within a directory in Hive. This feature allows you to define tables for specific files without changing the existing directory structure, enabling seamless data management for multiple teams using a common directory.

Apache Jira: HIVE-25569

New API for retrieving all table constraints

You can now use the getAllTableConstraints API to retrieve all table constraints such as Primary Key, Foreign Key, and others in a single call. This improvement consolidates multiple metastore calls into one, reducing the need for separate requests and improving efficiency. Local caching is also added to HiveServer to avoid duplicate calls to Hive Metastore.

Apache Jira: HIVE-22782

Beeline standalone execution with Java

You can now run Beeline as a standalone tool using Java without relying on HADOOP_HOME. A new distributable tarball isolates all necessary dependencies, allowing Beeline to run with just JRE and the required jars. This simplifies execution on edge nodes without needing a full Hive or Hadoop setup

Apache Jira: HIVE-24348

JWT authentication support in HTTP mode

You can now use JWT for authentication in HiveServer when running in HTTP mode. HiveServer retrieves the JWKS and verifies the JWT in the Authorization header, while the JDBC client can accept JWTs from either the environment variable or the JDBC URL, sending it in the Authorization header.

Apache Jira: HIVE-25575

Vectorization support for lead and lag functions

You can now benefit from vectorized execution for lead and lag functions, improving performance through better vectorization coverage.

Apache Jira: HIVE-24945

Dynamic connection pool for TxnHandler#connPoolMutex

You can now benefit from a dynamic connection pool for TxnHandler#connPoolMutex, replacing the fixed-size pool. This change allows the pool to scale by adding or closing connections on demand, improving resource efficiency for non-leader instances in the warehouse and making the Hive Metastore more scalable.

Apache Jira: HIVE-26794

Upgrade ORC to version 1.8.3

Hive now supports ORC version 1.8.3, offering improved memory usage and performance.

Apache Jira: HIVE-26809

Support for generic LDAP search bind filters in Hive

Hive's LDAP authentication has been enhanced to support generic LDAP search bind filters, making it easier to configure. New configurations have been added:
  • hive.server2.authentication.ldap.userSearchFilter
  • hive.server2.authentication.ldap.groupSearchFilter
  • hive.server2.authentication.ldap.groupBaseDN
These configurations will work alongside the existing hive.server2.authentication.ldap.baseDN. You can choose to use these new options or continue with the current setup, ensuring backward compatibility.

Apache Jira: HIVE-27311