Fixed Issues in Hive

Review the list of Hive issues that are resolved in Cloudera Runtime 7.1.9 SP1.

CDPD-70428: Issue with legacy timestamp conversion in Parquet files
When converting legacy timestamps in Parquet files, the date 'February 29, 200' can cause issues. This happens because the Julian day for February 29, 200 CE, differs between the Julian and Gregorian calendars.

Hive, which stores timestamps in UTC, encounters this issue when converting the date '200 CE/03/01' between timezones. Even if the original date was '200 CE/03/01' in the Asia/Singapore timezone, the conversion process leads to the "not a leap year exception" for February 29, 200.

java.time.DateTimeException: Invalid date 'February 29' as '200' is not a leap year

This issue has been fixed.

CDPD-67820: Select query failure due to decimal column data type change in Parquet
When you change the data type of a decimal column to STRING, CHAR, or VARCHAR in a Parquet file, the select queries on that column fail. This issue arises because the Parquet format handles these data types differently, causing mismatches and query failures.

This issue has been fixed.

CDPD-67819: Incorrect results for IN UDF on Parquet column of CHAR/VARCHAR type
When you use the IN UDF on a Parquet column of CHAR or VARCHAR type, you encounter incorrect results. To fix this issue CAST operations are added during the conversion process to properly handle these data types.
CDPD-33110: Hive queries stuck because of excessive aborted transactions
Hive queries can become unresponsive due to a high number of aborted transactions. The excessive number of aborted transactions causes the cleaner thread to consume all the memory of the Hive Metastore (HMS) while attempting to clean these transactions.

The issue was addressed by deleting aborted transactions directly instead of preloading them. This approach prevents the cleaner thread from overloading the HMS memory.

CDPD-69846: HPLSQL not using Hive variables
This issue is observed in CDP Private Cloud Base 7.1.9 and higher versions. When passing Hive variables using the --hivevar variable through Beeline, the variables are not utilized within HPLSQL procedures or cursors, leading to an unhandled exception in HPL/SQL.

This issue has been fixed.

CDPD-67350: DirectSQL and JDO results are different when fetching partitions by timestamp in DST shift
Fetching partition metadata from Hive Metastore using Java Data Objects (JDO) by timestamp does not provide correct result in DST shift in partition pruning.

The issue has been fixed.

CDPD-66902: Query runtime optimization
Optimized SQL query to remove duplicate records from COMPLETED_TXN_COMPONENTS to improve runtime.
CDPD-61611: Impala stats blocks Hive partitioned table rename
The issue occurs when the Impala tries to rename a Hive acid table, which has the Impala stats beforehand.
ERROR : Failed
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter table. Cannot change stats state for a transactional table default.parqtest without providing the transactional write state for verification (new write ID 6, valid write IDs null; current state null; new state {}

To address the issue, as the Impala rename does not change the table's stats, even though the write ID list for this table is absent, the absence is ignored when verifying the stats in such a case.

CDPD-55916: Confusing exception message in DagUtils.localizeResource

Hive jobs occasionally fail, during resource localization when a Tez session is created. DagUtils#localizeResource is responsible for copying the client's hive-exec.jar into HDFS. This process can be triggered from multiple threads concurrently, in which case one thread performs the copy while the others wait, polling for arrival of the destination file.

If there is an IOException during this process, it is assumed that the thread attempting the write failed, and all others abort. No information about the underlying IOException is logged. Instead, the log states:
java.io.IOException: Previous writer likely failed to write hdfs://....Failing because I am unlikely to write too

To address this issue, the logging is improved by stating that a failure on the writing thread is just one possible reason for the error. It also logs the exception stack trace to make it easier to find the real root cause.

CDPD-68278: Netty HttpPostRequestDecoder Vulnerability
Netty's HttpPostRequestDecoder can be tricked into accumulating unlimited data, leading to potential vulnerabilities.

To address the issue, upgraded Netty from version 4.1.100.Final to 4.1.108.Final.

CDPD-68251: Upgrade Commons-configuration2 to Version 2.10.1
To address the issue, upgrade Commons-configuration2 version to 2.10.1 to fix a crash caused by complex data structures.

Apache patch information

  • HIVE-28249
  • HIVE-26955
  • HIVE-26320
  • HIVE-27775
  • HIVE-13288
  • HIVE-28214
  • HIVE-27778