What's New in Apache Hive

Learn about the new features of Hive in Cloudera Runtime 7.2.16.

Simplified Hive Warehouse Connector configuration

Setting up HWC configurations that are required by Spark is now simplified. As a cluster administrator, you need to specify the required configurations in Cloudera Manager as a one-time activity and then enable HWC by setting the spark.cloudera.useHWC property to "true". The spark.cloudera.useHWC property can either be specified in the spark-defaults.conf file or by using the –conf option in spark-shell or spark-submit. For more information, see Setting up HWC configurations.

Hive Warehouse Connector Secure access mode

Hive Warehouse Connector (HWC) introduces the secure access mode that offers fine-grained access control (FGAC) column masking and row filtering to secure managed (ACID); or external, Hive table data that you query from Spark. Secure access mode requires you to set up staging location in your cloud storage service, such as S3 or ADLS, to temporarily store Hive files that users need to read from Spark. For more information, see Introduction to HWC Secure access mode.

Enable caching for Hive Warehouse Connector Secure access mode

You can enable caching for the HWC secure access mode to have finer control over read queries and ensure that the content updated outside of a Spark session is considered during reads. For more information, see Enabling caching for secure access mode

Hive ACID compaction observability

Compaction observability is a notification and information system based on metrics about the health of the compaction process. You can use Cloudera Manager to view compaction health checks for the Hive Metastore and Hive on Tez services, view actions and advice related to configurations and thresholds, and use the Compaction tab from the Hive Metastore service to view compaction-related charts based on the collected metrics. For more information, see Compaction Observability in Cloudera Manager.

Support for Hive Hybrid Procedural SQL

You can run Hive Hybrid Procedural SQL (HPL/SQL) queries from a client by connecting to Hive over JDBC. HPL/SQL is an Apache open source procedural extension for SQL for Hive users. For more information, see HPL/SQL stored procedures.

Hive Warehouse Connector support for Spark 3

As part of this release, Hive Warehouse Connector (HWC) is certified to work with Spark 3. You can use the binaries that are available in the /opt/cloudera/parcels/CDH/lib/hwc_for_spark3/ directory and use HWC to securely access Apache Hive managed tables from Spark. For more information, see Introduction to HWC.

Using SYS table to monitor compactions, transactions, and locks

You can monitor the progress and filter for specific compaction, transaction, and transaction lock jobs by querying the COMPACTIONS, TRANSACTIONS, and LOCKS view within the SYS database. For details, see Monitoring compactions, Monitoring transactions, and Monitoring transaction locks.

Support table defaults at database-level

You can use the database property, defaultTableType=EXTERNAL or ACID to specify the default table type to be created using the CREATE TABLE statement. You can specify this property when creating the database or at a later point using the ALTER DATABASE statement. For more information, see Understanding CREATE TABLE behavior.

Support external-only tables at database-level

You can choose to configure a database to allow only external tables to be created and prevent the creation of ACID tables. While creating a database, you can set the database property, EXTERNAL_TABLES_ONLY=true to ensure that only external tables are created in the database. For more information, see Understanding CREATE TABLE behavior.

Partition filtering support for the MSCK REPAIR TABLE statement

The MSCK REPAIR TABLE statement is enhanced to support filtering of the partition columns using operators so that a larger subset of partitions can be recovered (added/removed) without triggering a full repair. For more information, see Partition refresh and configuration.

Mapping specific columns in the INSERT clause of the MERGE statement

The MERGE statement is enhanced to support mapping of specific columns in the INSERT clause of the query instead of passing values (including null) for columns in the target table that do not have any data to insert. The unspecified columns in the INSERT clause are either mapped to null or use default constraints, if any. For more information, see Merging data in Hive tables.

Hive date and time UDF enhancements

The following Hive date and time user-defined functions (UDFs) are enhanced to use the DateTimeFormatter class instead of the SimpleDateFormat class, which may affect how date and timestamp values are parsed.

  • unix_timestamp(): The unix_timestamp() function is enhanced to use the DateTimeFormatter class for String format dates instead of the SimpleDateFormat class. For details, see HIVE-25458.
  • from_unixtime(): The from_unixtime() function is now enhanced to consider leap seconds. For details, see HIVE-25403.
  • date_format(): The date_format() function that previously returned the output in UTC time zone is enhanced to display the default user session time zone. For details, see HIVE-25093.
  • cast(): The cast() function is enhanced to display NULL when an incorrect date or timestamp is casted. Prior to this enhancement, when an incorrect date was casted, the function returned a converted value. For example, cast ('2020-20-20' as date) resulted in '2021-08-20' instead of NULL.

    This is because the DateTimeFormatter class that is used to parse string into date or timestamp was set to ResolverStyle.LENIENT. This is now updated to use ResolverStyle.STRICT and returns NULL when an invalid date or timestamp is casted. For details, see HIVE-25306.

Configuration option to URL encode special characters in hbase.column.mapping

As part of this release, a new Hive configuration option is introduced to URL encode special characters like '#' or '%' that are used in hbase.columns.mapping values. The characters have to be encoded because the values are used to form the URI for Ranger based authentication. For more information, see Using Hbase Hive integration.

HMS support for external databases using DataConnectors (Technical Preview)

This release introduces the ability to map databases that reside in an external datasource, into a local Hive Metastore (HMS). The external datasources can be of different types, such as MySQL, Postgres, Redshift, or other HMS instances. Currently, we have support for external tables using StorageHandlers like JDBCStorageHandler or HBaseStorageHandler, however, the mapping needs to be configured for each table and can be cumbersome for a database with large volumes of tables.

DataConnector is a HMS object that contains definition or configuration details (hostname, credentials, etc) of a remote data source that are required to connect to the data source. Using DataConnectors, you can map an entire database instead of individual tables. The metadata for these tables are not persisted in Hive and are mapped and built during runtime. For more information, see HIVE-24396.

Technical Preview: This is a technical preview feature and considered under development. Do not use this in your production systems. To share your feedback, contact Support by logging a case on our Cloudera Support Portal. Technical preview features are not guaranteed troubleshooting guidance and fixes.