Fixed Issues in Hive
Review the list of Hive issues that are resolved in Cloudera Runtime 7.1.9 SP1.
- CDPD-70428: Issue with legacy timestamp conversion in Parquet files
- When converting legacy timestamps in Parquet files, the date
'February 29, 200' can cause issues. This happens because the Julian day for February 29,
200 CE, differs between the Julian and Gregorian calendars.
Hive, which stores timestamps in UTC, encounters this issue when converting the date '200 CE/03/01' between timezones. Even if the original date was '200 CE/03/01' in the Asia/Singapore timezone, the conversion process leads to the "not a leap year exception" for February 29, 200.
java.time.DateTimeException: Invalid date 'February 29' as '200' is not a leap year
This issue has been fixed.
- CDPD-67820: Select query failure due to decimal column data type change in Parquet
- When you change the data type of a decimal column to STRING, CHAR,
or VARCHAR in a Parquet file, the select queries on that column fail. This issue arises
because the Parquet format handles these data types differently, causing mismatches and
query failures.
This issue has been fixed.
- CDPD-67819: Incorrect results for IN UDF on Parquet column of CHAR/VARCHAR type
- When you use the IN UDF on a Parquet column of CHAR or VARCHAR type, you encounter incorrect results. To fix this issue CAST operations are added during the conversion process to properly handle these data types.
- CDPD-33110: Hive queries stuck because of excessive aborted transactions
- Hive queries can become unresponsive due to a high number of
aborted transactions. The excessive number of aborted transactions causes the cleaner
thread to consume all the memory of the Hive Metastore (HMS) while attempting to clean
these transactions.
The issue was addressed by deleting aborted transactions directly instead of preloading them. This approach prevents the cleaner thread from overloading the HMS memory.
- CDPD-69846: HPLSQL not using Hive variables
- This issue is observed in CDP Private Cloud Base 7.1.9 and higher
versions. When passing Hive variables using the
--hivevar
variable through Beeline, the variables are not utilized within HPLSQL procedures or cursors, leading to an unhandled exception in HPL/SQL.This issue has been fixed.
- CDPD-67350: DirectSQL and JDO results are different when fetching partitions by timestamp in DST shift
- Fetching partition metadata from Hive Metastore using Java Data
Objects (JDO) by timestamp does not provide correct result in DST shift in partition
pruning.
The issue has been fixed.
- CDPD-66902: Query runtime optimization
- Optimized SQL query to remove duplicate records from
COMPLETED_TXN_COMPONENTS
to improve runtime. - CDPD-61611: Impala stats blocks Hive partitioned table rename
- The issue occurs when the Impala tries to rename a Hive acid
table, which has the Impala stats
beforehand.
ERROR : Failed org.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter table. Cannot change stats state for a transactional table default.parqtest without providing the transactional write state for verification (new write ID 6, valid write IDs null; current state null; new state {}
To address the issue, as the Impala rename does not change the table's stats, even though the write ID list for this table is absent, the absence is ignored when verifying the stats in such a case.
- CDPD-55916: Confusing exception message in DagUtils.localizeResource
-
Hive jobs occasionally fail, during resource localization when a Tez session is created. DagUtils#localizeResource is responsible for copying the client's hive-exec.jar into HDFS. This process can be triggered from multiple threads concurrently, in which case one thread performs the copy while the others wait, polling for arrival of the destination file.
If there is an IOException during this process, it is assumed that the thread attempting the write failed, and all others abort. No information about the underlying IOException is logged. Instead, the log states:java.io.IOException: Previous writer likely failed to write hdfs://....Failing because I am unlikely to write too
To address this issue, the logging is improved by stating that a failure on the writing thread is just one possible reason for the error. It also logs the exception stack trace to make it easier to find the real root cause.
- CDPD-68278: Netty HttpPostRequestDecoder Vulnerability
- Netty's HttpPostRequestDecoder can be tricked
into accumulating unlimited data, leading to potential vulnerabilities.
To address the issue, upgraded Netty from version 4.1.100.Final to 4.1.108.Final.
- CDPD-68251: Upgrade Commons-configuration2 to Version 2.10.1
- To address the issue, upgrade Commons-configuration2 version to 2.10.1 to fix a crash caused by complex data structures.
Apache patch information
- HIVE-28249
- HIVE-26955
- HIVE-26320
- HIVE-27775
- HIVE-13288
- HIVE-28214
- HIVE-27778