Fixed Issues in Apache Hive

Review the list of Hive issues that are resolved in Cloudera Runtime 7.1.7 SP3.

CDPD-11827: Value of headerThirdByte in Patched Base encoding goes beyond the range of byte
In Patched Base encoding, the first three bits of headerThirdByte represent the base value width. If Math.abs(min) is greater than or equal to 1 << 56, the value of baseBytes is 9, and the value of bb goes beyond range of byte.

The issue was fixed by extending the range.

CDPD-39708: Use cdpd_guava_version from external_versions.ini
This patch uses guava dependency version from cdpd repro.
CDPD-53885: Concurrent ACID direct inserts may fail with FileNotFoundException
The FileNotFound exception may occur when concurrently inserting into an ACID table with static partitions and when the hive.acid.direct.insert.enabled parameter is 'true'.

The exception occurs in the AcidUtils.getHdfsDirSnapshots method when trying to list the newly written files from the partition directory. The manifest directory in case of static partitions is located in the partition folder. If inserts are happening concurrently, it can happen that one thread already wrote the manifest file, but has not deleted it yet. Then another thread calls the AcidUtils.getHdfsDirSnapshots method which lists all the files and directories from the partition folder, including the manifest directory. But, the first thread deletes the manifest file after the listing and before iterating over the files and directories. So, the iterator throws a FileNotFoundException when trying to get the delete manifest directory.

The fix was to get the updated manifest file path if static partition is available before getting the manifest file.

CDPD-64909: Exception in Vectorization with Decimal64 to Decimal casting
Hive query fails with an exception that occurs because the code that casts Decimal64 to Decimal is not added in the filtered expression. When the code is added it causes a regression in check_constraint.q. This is because we do not want to convert Decimal64 to Decimal if the expression explicitly handles Decimal64 data types.

The fix was to add a method to these classes to prevent the conversion.

CDPD-65204: MV_CREATION_METADATA"."TXN_LIST"" : Bad value for type long
Unable to drop tables and facing issues with Hive Metastore (HMS) shutting down.

The fix replaces CLOB with VARCHAR in the TXN_LIST column of the MV_CREATION_METADATA table when storing creation metadata.

CDPD-65250: BytesColumnVector fails when the aggregate size is > 1 GB
BytesColumnVector will allocate a buffer for small values (< 1 MB), but fail with the following exception when the aggregate size of the buffer exceeds 1 GB.
new RuntimeException("Overflow of newLength. smallBuffer.length="
                + smallBuffer.length + ", nextElemLength=" + nextElemLength);

This change fixes BytesColumnVector so that it does not increase the small item buffer to be larger than 1 GB. Any allocations later uses individual buffers.

CDPD-65305: Move the offset updating in BytesColumnVector to setValPreallocated.
HIVE-25190 changed the semantics of BytesColumnVector so that ensureValPreallocated reserved the room, which interacted badly with ORC's redact mask code. The redact mask code needs to be able to increase the allocation as it goes so it can call the ensureValPreallocated multiple times.

The fix moves the offset updating in BytesColumnVector to setValPreallocated.

CDPD-65587: Create table may throw MetaException
CREATE TABLE statement may throw the following MetaException when metastore.warehouse.tenant.colocation is set to 'true':
message:java.lang.IllegalArgumentException: Can not create a Path from a null string

The issue was fixed by avoiding the IllegalArgumentException when managedLocation is null with colocation enabled.

CDPD-65405: Acid table bootstrap replication needs to handle directory created by compaction with txn id
Bootstrap replication for acid tables now works for compacted tables.
CDPD-65855: Concurrent UPDATEs may cause duplicate records in Hive
Concurrent UPDATE statements performed on the same transactional table may cause duplicate records. The issue is fixed by not clearing the query context when re-compiling the query.
CDPD-66277: Runing a drop partition using Spark results in a lost metastore connection warning
The problem is due to the getPartitions method, where the metastore client getMSC method is later closed by the getUserName method. The getUserName method closes the current metastore during setAuth and causes the underlying thrift transport to close.

The fix involved modifying Hive.java to ensure that the getUserName method is involked before the getMSC method.

Apache patch information

  • HIVE-21213
  • HIVE-23444
  • HIVE-23726
  • HIVE-24404
  • HIVE-25190
  • HIVE-25400
  • HIVE-25574
  • HIVE-26208
  • HIVE-26875
  • ORC-616