What's New in Apache Hive
Learn about the new features of Hive in Cloudera Runtime 7.3.1.
Trusting HTTP headers for authentication
When HTTP headers are authenticated via Knox, they can be trusted to establish a session without re-authenticating at HiveServer2. If a trusted header is present in the HTTP request, password-based authentication is skipped, and the client name is extracted directly from the Authorization header.
This change simplifies the authentication process, eliminating the need for repeated authorization since the trusted header confirms the user has already been authenticated by Knox.
Apache Jira : HIVE-25349
Multi-authentication support for SAML and LDAP in Hive
You can now connect to Hive using both SAML and LDAP authentication modes simultaneously when the transport mode is set to HTTP. This new feature allows you to use multiple authentication mechanisms concurrently, making it easier to connect without adjusting the authentication settings for different use cases. The configuration hive.server2.authentication now accepts comma-separated values for both SAML and LDAP.
Apache Jira : HIVE-25875
Improved query plans using constraint information
Hive now uses constraint information, such as not null
, when creating
RexNodes, leading to more optimized query plans. This update enables Hive to generate
simpler, more efficient query plans by avoiding unnecessary joins when not
null
constraints are applied.
Apache Jira : HIVE-26043
Print DAG ID to console
You can now view the DAG ID
directly in the console when executing
queries. This makes it easier to track and debug query executions by providing immediate
visibility of the DAG ID
.
Apache Jira : HIVE-25176
Increase default value of PartitionManagementTask frequency
The default frequency for metastore.partition.management.task.frequency has been increased from five minutes to six hours. This change ensures better performance in production environments with a lot of databases and tables, allowing enough time for the task to scan all tables and partitions.
Apache Jira : HIVE-27011
Support for both LDAP and kerberos authentication in HiveServer2
HiveServer2 now supports both LDAP and kerberos authentication simultaneously. The configuration hive.server2.authentication can accept comma-separated values for both Kerberos and LDAP even in binary mode.
Apache Jira: HIVE-27352
Thrift-over-HTTP support for Hive Metastore client
Hive Metastore client can now connect through Thrift-over-HTTP, enabling access through Knox.
Apache Jira: HIVE-21456
Data connector authorization on the Hive Metastore server side
You can now authorize Data Definition Language (DDL) operations for connectors on the Hive Metastore server side. This enhancement improves security by ensuring only authorized users can perform these operations.
Apache Jira: HIVE-26248
Setting the user for compaction tasks
This update introduces a new configuration that allows you to specify a user for running compaction tasks, instead of relying on the table directory owner by default. The configuration provides flexibility by enabling you to assign a specific user for compaction operations, including file listing in the Initiator and Cleaner.
This is useful when you need to run compaction as a specific user, giving administrators more control over permissions and task management. Admins can now optionally configure the user that will handle compaction tasks.
Apache Jira: HIVE-24191
Support for HDFS snapshots
You can now utilize HDFS snapshots to enhance external table replication. With the addition of DistCp diff using snapshots, replication is streamlined to include only modified entries. This eliminates the need to list all files and directories, significantly reducing the effort and time required for data copying.
Apache Jira: HIVE-24852
Ability to create tables on individual files directly
You can now create tables directly on individual files within a directory in Hive. This feature allows you to define tables for specific files without changing the existing directory structure, enabling seamless data management for multiple teams using a common directory.
Apache Jira: HIVE-25569
New API for retrieving all table constraints
You can now use the getAllTableConstraints API to retrieve all table constraints such as Primary Key, Foreign Key, and others in a single call. This improvement consolidates multiple metastore calls into one, reducing the need for separate requests and improving efficiency. Local caching is also added to HiveServer to avoid duplicate calls to Hive Metastore.
Apache Jira: HIVE-22782
Beeline standalone execution with Java
You can now run Beeline as a standalone tool using Java without relying on
HADOOP_HOME
. A new distributable tarball isolates all necessary
dependencies, allowing Beeline to run with just JRE and the required jars. This simplifies
execution on edge nodes without needing a full Hive or Hadoop setup
Apache Jira: HIVE-24348
JWT authentication support in HTTP mode
You can now use JWT for authentication in HiveServer when running in HTTP mode. HiveServer retrieves the JWKS and verifies the JWT in the Authorization header, while the JDBC client can accept JWTs from either the environment variable or the JDBC URL, sending it in the Authorization header.
Apache Jira: HIVE-25575
Vectorization support for lead and lag functions
You can now benefit from vectorized execution for lead and lag functions, improving performance through better vectorization coverage.
Apache Jira: HIVE-24945
Dynamic connection pool for TxnHandler#connPoolMutex
You can now benefit from a dynamic connection pool for
TxnHandler#connPoolMutex
, replacing the fixed-size pool. This change
allows the pool to scale by adding or closing connections on demand, improving resource
efficiency for non-leader instances in the warehouse and making the Hive Metastore more
scalable.
Apache Jira: HIVE-26794
Upgrade ORC to version 1.8.3
Hive now supports ORC version 1.8.3
, offering improved memory usage and
performance.
Apache Jira: HIVE-26809
Support for generic LDAP search bind filters in Hive
- hive.server2.authentication.ldap.userSearchFilter
- hive.server2.authentication.ldap.groupSearchFilter
- hive.server2.authentication.ldap.groupBaseDN
Apache Jira: HIVE-27311