What's New in Apache Hive
Learn about the new features of Hive in Cloudera Runtime 7.2.16.
Simplified Hive Warehouse Connector configuration
Setting up HWC configurations that are required by Spark is now simplified. As a cluster
administrator, you need to specify the required configurations in Cloudera Manager as a
one-time activity and then enable HWC by setting the spark.cloudera.useHWC
property to "true". The spark.cloudera.useHWC
property can either be
specified in the spark-defaults.conf
file or by using the
–conf
option in spark-shell
or
spark-submit
. For more information, see Setting up HWC configurations.
Hive Warehouse Connector Secure access mode
Hive Warehouse Connector (HWC) introduces the secure access mode that offers fine-grained access control (FGAC) column masking and row filtering to secure managed (ACID); or external, Hive table data that you query from Spark. Secure access mode requires you to set up staging location in your cloud storage service, such as S3 or ADLS, to temporarily store Hive files that users need to read from Spark. For more information, see Introduction to HWC Secure access mode.
Enable caching for Hive Warehouse Connector Secure access mode
You can enable caching for the HWC secure access mode to have finer control over read queries and ensure that the content updated outside of a Spark session is considered during reads. For more information, see Enabling caching for secure access mode
Hive ACID compaction observability
Compaction observability is a notification and information system based on metrics about the health of the compaction process. You can use Cloudera Manager to view compaction health checks for the Hive Metastore and Hive on Tez services, view actions and advice related to configurations and thresholds, and use the Compaction tab from the Hive Metastore service to view compaction-related charts based on the collected metrics. For more information, see Compaction Observability in Cloudera Manager.
Support for Hive Hybrid Procedural SQL
You can run Hive Hybrid Procedural SQL (HPL/SQL) queries from a client by connecting to Hive over JDBC. HPL/SQL is an Apache open source procedural extension for SQL for Hive users. For more information, see HPL/SQL stored procedures.
Hive Warehouse Connector support for Spark 3
As part of this release, Hive Warehouse Connector (HWC) is certified to work with Spark 3. You can use the binaries that are available in the /opt/cloudera/parcels/CDH/lib/hwc_for_spark3/ directory and use HWC to securely access Apache Hive managed tables from Spark. For more information, see Introduction to HWC.
Using SYS table to monitor compactions, transactions, and locks
You can monitor the progress and filter for specific compaction, transaction, and transaction lock jobs by querying the COMPACTIONS, TRANSACTIONS, and LOCKS view within the SYS database. For details, see Monitoring compactions, Monitoring transactions, and Monitoring transaction locks.
Support table defaults at database-level
You can use the database property, defaultTableType=EXTERNAL or ACID
to
specify the default table type to be created using the CREATE TABLE statement. You can
specify this property when creating the database or at a later point using the ALTER
DATABASE statement. For more information, see Understanding CREATE TABLE behavior.
Support external-only tables at database-level
You can choose to configure a database to allow only external tables to be created and
prevent the creation of ACID tables. While creating a database, you can set the database
property, EXTERNAL_TABLES_ONLY=true
to ensure that only external tables are
created in the database. For more information, see Understanding CREATE TABLE behavior.
Partition filtering support for the MSCK REPAIR TABLE statement
The MSCK REPAIR TABLE statement is enhanced to support filtering of the partition columns using operators so that a larger subset of partitions can be recovered (added/removed) without triggering a full repair. For more information, see Partition refresh and configuration.
Mapping specific columns in the INSERT clause of the MERGE statement
The MERGE statement is enhanced to support mapping of specific columns in the INSERT clause of the query instead of passing values (including null) for columns in the target table that do not have any data to insert. The unspecified columns in the INSERT clause are either mapped to null or use default constraints, if any. For more information, see Merging data in Hive tables.
Hive date and time UDF enhancements
The following Hive date and time user-defined functions (UDFs) are enhanced to use the
DateTimeFormatter
class instead of the SimpleDateFormat
class, which may affect how date and timestamp values are parsed.
unix_timestamp()
: Theunix_timestamp()
function is enhanced to use the DateTimeFormatter class for String format dates instead of the SimpleDateFormat class. For details, see HIVE-25458.from_unixtime()
: Thefrom_unixtime()
function is now enhanced to consider leap seconds. For details, see HIVE-25403.date_format()
: Thedate_format()
function that previously returned the output in UTC time zone is enhanced to display the default user session time zone. For details, see HIVE-25093.cast()
: Thecast()
function is enhanced to display NULL when an incorrect date or timestamp is casted. Prior to this enhancement, when an incorrect date was casted, the function returned a converted value. For example,cast ('2020-20-20' as date)
resulted in '2021-08-20' instead of NULL.This is because the DateTimeFormatter class that is used to parse string into date or timestamp was set to
ResolverStyle.LENIENT
. This is now updated to useResolverStyle.STRICT
and returns NULL when an invalid date or timestamp is casted. For details, see HIVE-25306.
Configuration option to URL encode special characters in
hbase.column.mapping
As part of this release, a new Hive configuration option is introduced to URL encode
special characters like '#' or '%' that are used in hbase.columns.mapping
values. The characters have to be encoded because the values are used to form the URI for
Ranger based authentication. For more information, see Using Hbase Hive integration.
HMS support for external databases using DataConnectors (Technical Preview)
This release introduces the ability to map databases that reside in an external datasource, into a local Hive Metastore (HMS). The external datasources can be of different types, such as MySQL, Postgres, Redshift, or other HMS instances. Currently, we have support for external tables using StorageHandlers like JDBCStorageHandler or HBaseStorageHandler, however, the mapping needs to be configured for each table and can be cumbersome for a database with large volumes of tables.
DataConnector is a HMS object that contains definition or configuration details (hostname, credentials, etc) of a remote data source that are required to connect to the data source. Using DataConnectors, you can map an entire database instead of individual tables. The metadata for these tables are not persisted in Hive and are mapped and built during runtime. For more information, see HIVE-24396.
Technical Preview: This is a technical preview feature and considered under development. Do not use this in your production systems. To share your feedback, contact Support by logging a case on our Cloudera Support Portal. Technical preview features are not guaranteed troubleshooting guidance and fixes.