Functional adjustments and behavioral updates for Hive are introduced in Cloudera Runtime 7.3.2, its service packs and cumulative hotfixes.
Cloudera Runtime 7.3.2
Cloudera Runtime 7.3.2 introduces functional adjustments, behavioral updates
for Hive, and includes all service packs and cumulative hotfixes from 7.3.1.100 through
7.3.1.706. For a comprehensive record of all functional adjustments in Cloudera Runtime 7.3.1.x, see Behavioral Changes.
- Summary:
- Handling invalid date formats in
to_date
function.The behavior of the to_date function has changed between Cloudera Runtime versions 7.1.7 SP2 and 7.1.9 when handling invalid date
formats.
- Previous behavior:
- Valid dates (e.g., YYYY-MM-DD): Returned correct
results.
Invalid dates: Returned random, unexpected dates instead of NULL.
- New behavior:
- Following the changes introduced inHIVE-28483:
Valid dates: Continue to return correct
results.
Invalid dates (e.g., DD-MM-YYYY): Now return NULL.
- Summary:
- Metastore secondary connection pool size is now configurable
- Previous behavior: The Metastore's secondary connection
pool had a fixed size of 2. This often led to connection limitations, especially under heavy
workloads.
- New behavior:: You can now configure the metastore's
secondary connection pool size using the property
datanucleus.connectionPool.secondary.maxPoolSize. This lets you adjust the
pool beyond its default of 2, preventing connection limitations and improving
performance.
- Summary:
- New configuration available to disable the Partition Management Task
- Previous behavior:
- It was not possible to disable the
PartitionManagementTask, which put a heavy load on the Cloudera Manager Metastore when managing a large number of tables and partitions. Customers had a tedious workaround by providing a pattern via metastore.partition.management.database.pattern / metastore.partition.management.table.pattern.
- New behavior:
- You can now set the metastore.partition.management.task.frequency configuration to 0 to disable the partition management task cluster-wide. This helps to reduce the load on the Cloudera Manager Metastore.
Apache Jira: HIVE-25324
- Summary:
- New default restriction for Iceberg table data file locations
- Previous behavior:
- In earlier versions, the hive.iceberg.allow.datafiles.in.table.location.only property was set to false by default. This allowed Hive to access and read Iceberg data files even if they were located outside of the specific table directory.
- New behavior:
- The default value for hive.iceberg.allow.datafiles.in.table.location.only is now true. This security enhancement ensures that only data files located within the table directory are accessible. If you attempt to read an Iceberg table that contains data files outside of its directory, Hive now returns an error. If your existing workflows rely on data files stored in external locations, you can disable this restriction by using the Cloudera Manager to set the property to false.
- Summary:
- Lineage information computation enabled by default
- Previous behavior:
- Previously, lineage information was computed only if specific hardcoded post-execution hooks were configured or if the deprecated HIVE_LINEAGE_INFO property was set to true.
- New behavior:
Lineage information is now collected by default for all queries.
However, the system only records and passes lineage to hooks for the specific query types
defined in the HIVE_LINEAGE_STATEMENT_FILTER property. By default, this
includes CREATE_TABLE, CREATE_TABLE_AS_SELECT,
CREATE_VIEW, CREATE_MATERIALIZED_VIEW, and
LOAD.Apache Jira: HIVE-28768
- Summary:
- Increased Batch Sizes for COMPUTE STATS
- Previous behavior:
- The COMPUTE STATS query previously failed on tables containing more than 5000 columns. This issue was specific to wide tables and could not be resolved by dropping and rerunning the query.
- New behavior:
- To resolve this, we enable the batch retrieval or insertion of the object metadata by default. The default value of the hive.metastore.direct.sql.batch.size property is changed from 0 to 1000, and the default value of the metastore.rawstore.batch.size property is changed from -1 to 500. After this change, COMPUTE STATS queries now run successfully on tables with more than 5000 columns.