Behavioral Changes in Hive

Functional adjustments and behavioral updates for Hive are introduced in Cloudera Runtime 7.3.2, its service packs and cumulative hotfixes.

Cloudera Runtime 7.3.2

Cloudera Runtime 7.3.2 introduces functional adjustments, behavioral updates for Hive, and includes all service packs and cumulative hotfixes from 7.3.1.100 through 7.3.1.706. For a comprehensive record of all functional adjustments in Cloudera Runtime 7.3.1.x, see Behavioral Changes.

Summary:

Handling invalid date formats in to_date function.

The behavior of the to_date function has changed between Cloudera Runtime versions 7.1.7 SP2 and 7.1.9 when handling invalid date formats.

Previous behavior:

Valid dates (e.g., YYYY-MM-DD): Returned correct results.

Invalid dates: Returned random, unexpected dates instead of NULL.

New behavior:

Following the changes introduced inHIVE-28483:

Valid dates: Continue to return correct results.

Invalid dates (e.g., DD-MM-YYYY): Now return NULL.

Summary:

Metastore secondary connection pool size is now configurable

Previous behavior: The Metastore's secondary connection pool had a fixed size of 2. This often led to connection limitations, especially under heavy workloads.

New behavior:: You can now configure the metastore's secondary connection pool size using the property datanucleus.connectionPool.secondary.maxPoolSize. This lets you adjust the pool beyond its default of 2, preventing connection limitations and improving performance.

Summary:

New configuration available to disable the Partition Management Task

Previous behavior:

It was not possible to disable the PartitionManagementTask, which put a heavy load on the Cloudera Manager Metastore when managing a large number of tables and partitions. Customers had a tedious workaround by providing a pattern via metastore.partition.management.database.pattern / metastore.partition.management.table.pattern.

New behavior:

You can now set the metastore.partition.management.task.frequency configuration to 0 to disable the partition management task cluster-wide. This helps to reduce the load on the Cloudera Manager Metastore.

Apache Jira: HIVE-25324

Summary:

New default restriction for Iceberg table data file locations

Previous behavior:

In earlier versions, the hive.iceberg.allow.datafiles.in.table.location.only property was set to false by default. This allowed Hive to access and read Iceberg data files even if they were located outside of the specific table directory.

New behavior:

The default value for hive.iceberg.allow.datafiles.in.table.location.only is now true. This security enhancement ensures that only data files located within the table directory are accessible. If you attempt to read an Iceberg table that contains data files outside of its directory, Hive now returns an error. If your existing workflows rely on data files stored in external locations, you can disable this restriction by using the Cloudera Manager to set the property to false.

Summary:

Lineage information computation enabled by default

Previous behavior:

Previously, lineage information was computed only if specific hardcoded post-execution hooks were configured or if the deprecated HIVE_LINEAGE_INFO property was set to true.

New behavior:

Lineage information is now collected by default for all queries. However, the system only records and passes lineage to hooks for the specific query types defined in the HIVE_LINEAGE_STATEMENT_FILTER property. By default, this includes CREATE_TABLE, CREATE_TABLE_AS_SELECT,

CREATE_VIEW, CREATE_MATERIALIZED_VIEW, and LOAD.

Apache Jira: HIVE-28768

Summary:

Increased Batch Sizes for COMPUTE STATS

Previous behavior:

The COMPUTE STATS query previously failed on tables containing more than 5000 columns. This issue was specific to wide tables and could not be resolved by dropping and rerunning the query.

New behavior:

To resolve this, we enable the batch retrieval or insertion of the object metadata by default. The default value of the hive.metastore.direct.sql.batch.size property is changed from 0 to 1000, and the default value of the metastore.rawstore.batch.size property is changed from -1 to 500. After this change, COMPUTE STATS queries now run successfully on tables with more than 5000 columns.