Fixed Issues in Apache Hive
Review the list of Hive issues that are resolved in Cloudera Runtime 7.1.7 SP3.
- CDPD-11827: Value of headerThirdByte in Patched Base encoding goes beyond the range of byte
- In Patched Base encoding, the first three bits of
headerThirdByte represent the base value width. If Math.abs(min) is greater than or
equal to 1 << 56, the value of baseBytes is 9, and the value of bb goes beyond
range of byte.
The issue was fixed by extending the range.
- CDPD-39708: Use cdpd_guava_version from external_versions.ini
- This patch uses guava dependency version from cdpd repro.
- CDPD-53885: Concurrent ACID direct inserts may fail with FileNotFoundException
- The FileNotFound exception may occur when concurrently inserting
into an ACID table with static partitions and when the
hive.acid.direct.insert.enabled
parameter is 'true'.The exception occurs in the
AcidUtils.getHdfsDirSnapshots
method when trying to list the newly written files from the partition directory. The manifest directory in case of static partitions is located in the partition folder. If inserts are happening concurrently, it can happen that one thread already wrote the manifest file, but has not deleted it yet. Then another thread calls theAcidUtils.getHdfsDirSnapshots
method which lists all the files and directories from the partition folder, including the manifest directory. But, the first thread deletes the manifest file after the listing and before iterating over the files and directories. So, the iterator throws a FileNotFoundException when trying to get the delete manifest directory.The fix was to get the updated manifest file path if static partition is available before getting the manifest file.
- CDPD-64909: Exception in Vectorization with Decimal64 to Decimal casting
- Hive query fails with an exception that occurs because the code
that casts Decimal64 to Decimal is not added in the filtered expression. When the code
is added it causes a regression in
check_constraint.q
. This is because we do not want to convert Decimal64 to Decimal if the expression explicitly handles Decimal64 data types.The fix was to add a method to these classes to prevent the conversion.
- CDPD-65204: MV_CREATION_METADATA"."TXN_LIST"" : Bad value for type long
- Unable to drop tables and facing issues with Hive Metastore
(HMS) shutting down.
The fix replaces
CLOB
withVARCHAR
in theTXN_LIST
column of theMV_CREATION_METADATA
table when storing creation metadata. - CDPD-65250: BytesColumnVector fails when the aggregate size is > 1 GB
- BytesColumnVector will allocate a buffer for small values (<
1 MB), but fail with the following exception when the aggregate size of the buffer
exceeds 1 GB.
new RuntimeException("Overflow of newLength. smallBuffer.length=" + smallBuffer.length + ", nextElemLength=" + nextElemLength);
This change fixes BytesColumnVector so that it does not increase the small item buffer to be larger than 1 GB. Any allocations later uses individual buffers.
- CDPD-65305: Move the offset updating in BytesColumnVector to setValPreallocated.
- HIVE-25190 changed the semantics of
BytesColumnVector so that ensureValPreallocated reserved the room, which interacted
badly with ORC's redact mask code. The redact mask code needs to be able to increase the
allocation as it goes so it can call the ensureValPreallocated multiple times.
The fix moves the offset updating in BytesColumnVector to setValPreallocated.
- CDPD-65587: Create table may throw MetaException
- CREATE TABLE statement may throw the following MetaException
when
metastore.warehouse.tenant.colocation
is set to 'true':message:java.lang.IllegalArgumentException: Can not create a Path from a null string
The issue was fixed by avoiding the IllegalArgumentException when managedLocation is null with colocation enabled.
- CDPD-65405: Acid table bootstrap replication needs to handle directory created by compaction with txn id
- Bootstrap replication for acid tables now works for compacted tables.
- CDPD-65855: Concurrent UPDATEs may cause duplicate records in Hive
- Concurrent UPDATE statements performed on the same transactional table may cause duplicate records. The issue is fixed by not clearing the query context when re-compiling the query.
- CDPD-66277: Runing a drop partition using Spark results in a lost metastore connection warning
- The problem is due to the getPartitions method, where the metastore client getMSC
method is later closed by the getUserName method. The getUserName method closes the
current metastore during setAuth and causes the underlying thrift transport to
close.
The fix involved modifying Hive.java to ensure that the getUserName method is involked before the getMSC method.
Apache patch information
- HIVE-21213
- HIVE-23444
- HIVE-23726
- HIVE-24404
- HIVE-25190
- HIVE-25400
- HIVE-25574
- HIVE-26208
- HIVE-26875
- ORC-616