Review the list of Hive issues that are resolved in Cloudera Runtime
7.3.1, its service packs and cumulative hotfixes.
Cloudera Runtime 7.3.1.400 SP2:🔗
- CDPD-81766: Database Setting Consistency in Spark3 HWC
- Spark3's Hive Warehouse Connector (HWC) did not consistently
apply the database setting when validating if a table existed during append mode writes.
This led to inconsistencies where the database setting was not used for validation, even
though data was correctly written to the intended database.
- This issue was resolved and is now in a patch-ready state. This
ensures the database setting is consistently applied during table validation in Spark3
HWC, preventing prior inconsistencies.
- CDPD-81122: Enhanced Concurrent Access in HWC Secure Mode
- Spark applications running multiple concurrent queries in HWC's
SECURE_ACCESS
mode encountered failures and correctness problems.
This happened because the system faced difficulties when generating temporary table
names and managing staging directories simultaneously for multiple reads.
- This issue was addressed by improving the handling of concurrent
operations within HWC's
SECURE_ACCESS
mode.
- CDPD-81453: Efficient Handling of Timed-Out Transactions in
Replication
- Hive replication did not log transactions that timed out as
'ABORTED'. This caused these transactions to remain on the target cluster for an
extended period.
- This issue was resolved by ensuring that transactions aborted
due to timeout are now properly logged. This allows their abort event to be replicated,
leading to prompt removal from the target environment.
Apache Jira: HIVE-27797
- CDPD-81420: Table Filtering for Ranger Policies
- Ownership details for tables were not correctly carried through
the system during filtering, which prevented Ranger from applying policies based on who
owned the tables.
- This issue was resolved by ensuring that ownership information
is now consistently included when tables are filtered. This allows Ranger to accurately
enforce policies based on table ownership, leading to improved performance when
filtering databases and tables.
- CDPD-77626: Improving performance of ALTER PARTITION operations
using direct SQL
- Running
ALTER PARTITION
operations using direct
SQL failed for some databases. The failures occurred due to missing data type
conversions for CLOB and Boolean fields, causing the system to fall back to slower ORM
(Object Relational Mapping) paths.
- The issue was addressed by adding proper handling for CLOB and
Boolean type conversions. With this fix,
ALTER PARTITION
operations now
run successfully using direct SQL.Apache Jira: HIVE-28271, HIVE-27530
Cloudera Runtime 7.3.1.300 SP1 CHF 1🔗
- CDPD-64950: Deadlock during Spark shutdown due to duplicate
transaction cleanup
- 7.3.1.300
- During Spark application shutdown, transactions were being
closed by two separate mechanisms at the same time. This parallel cleanup could result
in a deadlock, especially when the heartbeat interval was set to a low value.
- The issue was addressed by ensuring that transaction cleanup
occurs through a single mechanism during shutdown, avoiding concurrent execution and
potential deadlocks.
- CDPD-78334: Support custom delimiter in
SkippingTextInputFormat
- 7.3.1.300
- Queries like
SELECT COUNT(*)
returned wrong
results when a custom record delimiter was used. The input file was read as a single
line because the custom delimiter was ignored.
- The issue was addressed by ensuring that the custom record
delimiter is considered while reading the file, so that queries work as
expected.
Apache Jira: HIVE-27498
- CDPD-79237: Hive Metastore schema upgrade fails due to NULL
values
- 7.3.1.300
- Upgrading from CDP Private Cloud Base 7.1.7.2052 to 7.1.9.1010
fails during the Hive Metastore schema upgrade. The upgrade script issues the following
command:
ALTER TABLE "DBS" ALTER COLUMN "TYPE" SET DEFAULT 'NATIVE', ALTER COLUMN "TYPE" SET NOT NULL;
This
fails because the DBS.TYPE
column contains NULL values. These NULLs are
introduced by canary databases created by Cloudera Manager, which
insert entries in the HMS database without setting the TYPE.
- The issue was addressed by ensuring that canary databases
created by Cloudera Manager correctly populate the TYPE column in the
DBS
table, preventing NULL values and allowing the schema upgrade to
proceed.
Cloudera Runtime 7.3.1.200 SP1🔗
- CDPD-78342/CDPD-72605: Optimized partition authorization in
HiveMetaStore to reduce overhead
- 7.3.1.200
- The
add_partitions()
API in
HiveMetastore was authorizing both new and existing partitions, leading to unnecessary
processing and increased load on the authorization service.
- The issue was addressed by modifying the
add_partitions()
API to authorize only new partitions, improving
performance and reducing authorization overhead.
- CDPD-77990: Upgraded MySQL Connector/J to 8.2.0 to fix
CVE-2023-22102
- 7.3.1.200
- The existing MySQL Connector/J version was vulnerable to
CVE-2023-22102.
- The issue was addressed by upgrading mysql-connector-j to
version 8.2.0 in packaging/src/docker/Dockerfile.
- CDPD-62654/CDPD-77985: Hive Metastore now sends a single
AlterPartitionEvent for bulk partition updates
- 7.3.1.200
- HiveMetastore previously sent individual
AlterPartitionEvent for each altered partition, leading to inefficiencies and pressure
on the back db.
- The issue was addressed by modifying Hive Metastore to send a
single AlterPartitionEvents containing a list of partitions for bulk updates,
hive.metastore.alterPartitions.notification.v2.enabledto turn on
this feature.
Apache Jira:HIVE-27746
- CDPD-73669: Secondary pool connection starvation caused by
updatePartitionColumnStatisticsInBatch API
- 7.3.1.200
- Hive queries intermittently failed with
Connection is
not available, request timed out
errors. The issue occurred because the
updatePartitionColumnStatisticsInBatch method in ObjectStore used
connections from the secondary pool, which had a pool size of only two, leading to
connection starvation.
- The fix ensures that the
updatePartitionColumnStatisticsInBatch API now requests connections
from the primary connection pool, preventing connection starvation in the secondary pool.
Apache Jira:
HIVE-28456
- CDPD-61676/CDPD-78341: Drop renamed external table fails due to
missing update in PART_COL_STATS
- 7.3.1.200
- When hive.metastore.try.direct.sql.ddl is set to false, dropping
an external partitioned table after renaming it fails due to a foreign key constraint
error in the
PART_COL_STATS
table. The table name in
PART_COL_STATS
is not updated during the rename, causing issues
during deletion.
- The issue was addressed by ensuring that the
PART_COL_STATS
table is updated during the rename operation, making
partition column statistics usable after the rename and allowing the table to be dropped
successfully.Apache Jira: HIVE-27539
- CDPD-79469: Selecting data from a bucketed table with a decimal
column throws NPE
- 7.3.1.200
- When hive.tez.bucket.pruning is enabled,
selecting data from a bucketed table with a decimal column type fails with a
NullPointerException
. The issue occurs due to a mismatch in decimal
precision and scale while determining the bucket number, causing an overflow and
returning null.
- The issue was addressed by ensuring that the correct decimal
type information is used from the actual field object inspector instead of the default
type info, preventing the overflow and
NullPointerException
.Apache Jira: HIVE-28076
- CDPD-74095: Connection timeout while inserting Hive partitions
due to secondary connection pool limitation
- 7.3.1.200
- Since HIVE-26419, Hive uses a secondary connection pool (size 2)
for schema and value generation. However, this pool also handles nontransactional
connections, causing the
updatePartitionColumnStatisticsInBatch
request
to fail with a Connection is not available, request timed out
error
when the pool reaches its limit during slow insert or update operations.
- The issue was addressed by ensuring that time-consuming API
requests use the primary connection pool instead of the secondary pool, preventing
connection exhaustion.
Apache Jira: HIVE-28456
- CDPD-78331: HPLSQL built-in functions fail in insert
statement
- 7.3.1.200
- After the HIVE-27492 fix, some HPLSQL built-in functions like
trim and lower stopped working in INSERT statements. This happened because UDFs already
present in Hive were removed to avoid duplication, but HPLSQL's local and offline modes
still required them.
- The issue was addressed by restoring the removed UDFs in HPLSQL
and fixing related function issues to ensure compatibility in all execution
modes.
Apache Jira: HIVE-28143
- CDPD-78343: Syntax error in HPL/SQL error handling
- 7.3.1.200
- In HPL/SQL, setting hplsql.onerror using
the SET command resulted in a syntax error because the grammar file (Hplsql.g4) only
allowed identifiers without dots (.).
- The issue was addressed by updating the grammar to support
qualified identifiers, allowing the SET command to accept dot (.) notation.
Example: EXECUTE 'SET hive.merge.split.update=true';
Apache Jira:
HIVE-28253
- CDPD-78330: HPL/SQL built-in functions like sysdate not
working
- 7.3.1.200
- HPL/SQL built-in functions that are not available in Hive, such
as sysdate, were failing with a SemanticException when used in queries. Only functions
present in both HPL/SQL and Hive were working.
- The issue was addressed by modifying the query parsing logic.
Now, HPL/SQL built-in functions are executed directly, and only functions also available
in Hive are forwarded to Hive for execution.
Apache Jira: HIVE-27492
- CDPD-78345: Signalling CONDITION HANDLER is not working in
HPLSQL
- 7.3.1.200
- The user-defined
CONDITION HANDLER
s in HPLSQL
are not being triggered as expected. Instead of running the handlers, the system only
logs the conditions, so the handlers aren't available when needed.
- The issue was addressed by ensuring that user-defined condition
handlers are properly registered and invoked when a SIGNAL statement raises a
corresponding condition.
Apache Jira: HIVE-28215
- CDPD-78333: EXECUTE IMMEDIATE throwing ClassCastException in
HPL/SQL
- 7.3.1.200
- When executing a
select count(*)
query, it
returns a long value, but HPLSQL expects a string. This mismatch causes the following
error:Caused by: java.lang.ClassCastException: class java.lang.Long cannot be cast to class java.lang.String
at org.apache.hive.service.cli.operation.hplsql.HplSqlQueryExecutor$OperationRowResult.get
- The issue was addressed by converting the result to a string
when the expected type is a string.
Apache Jira: HIVE-28215
- CDPD-79844: EXECUTE IMMEDIATE displaying error despite
successful data load
- 7.3.1.200
- Running
EXECUTE IMMEDIATE 'LOAD DATA INPATH
''/tmp/test.txt'' OVERWRITE INTO TABLE test_table'
displayed an error on the
console, even though the data was successfully loaded into the table. This occurred
because HPL/SQL attempted to check the result set metadata after execution, but LOAD
DATA queries do not return a result set, leading to a
NullPointerException
.
- The issue was addressed by ensuring that result set metadata is
accessed only when a result set is present.
Apache Jira: HIVE-28766
- CDPD-67033: HWC for Spark 3 compatibility with Spark 3.5
- 7.3.1.200
- The Spark 3.5, based on Cloudera on cloud 7.2.18 libraries, caused a failure in the
HWC for Spark 3 build. Canary builds indicate that broke compatibility.
- The issue was addressed by updating HWC for Spark 3 to align
with Spark 3.5 changes and ensuring compatibility with Cloudera on cloud 7.2.18 dependencies
- CDPD-80097: Datahub recreation fails due to Hive Metastore
schema validation error
- 7.3.1.200
- Datahub recreation on Azure fails because Hive Metastore schema
validation cannot retrieve the schema version due to insufficient permissions on the
VERSION
table.
- This issue is now fixed.
Cloudera Runtime 7.3.1.100 CHF 1🔗
- CDPD-74456: Spark3 hwc.setDatabase() writes to the correct
database
- 7.3.1.100
- When setting the database using hive.setDatabase("DB") and
performing CREATE TABLE or write operations with Hive Warehouse Connector (HWC), the
operations were executed in a default database. This issue is now resolved and the
operations are executed in the correct database.
- The issue is now fixed.
- CDPD-74373: Timestamp displays incorrectly in Spark HWC with
JDBC_READER mode
- 7.3.1.100
- When using Spark HWC with JDBC_READER mode, timestamps were
displayed incorrectly. For example, 0001-01-01 00:00:00.0 was interpreted as 0000-12-30
00:00:00.
- This issue is addressed by correcting timestamp handling in
JDBC_READER mode to ensure accurate representation of timestamps before the Gregorian
calendar was adopted.
- CDPD-76932: Incorrect query results due to TableScan merge in
shared work optimizer
- 7.3.1.100
- During shared work optimization, TableScan operators were merged
even when they had different Dynamic Partition Pruning (DPP) parent operators. This
caused the filter from the missing DPP operator to be ignored, leading to incorrect
query results.
- This issue is resolved by modifying the shared work optimizer to
check the parents of TableScan operators and skip merging when DPP edges
differ.
Apache Jira: HIVE-26968
- CDPD-78115: Thread safety issue in
HiveSequenceFileInputFormat
- 7.3.1.100
- Concurrent queries returned incorrect results when query result
caching was disabled due to a thread safety issue in HiveSequenceFileInputFormat.
- This issue is now resolved and the files are now set in a
thread-safe manner to ensure correct query results.
- CDPD-78129: Materialized view rebuild failure due to stale
locks
- 7.3.1.100
- If a materialized view rebuild is aborted, the lock entry in the
materialization_rebuild_locks table is not removed. This prevents subsequent rebuilds of
the same materialized view, causing
error
Error: Error while compiling statement: FAILED: SemanticException
org.apache.hadoop.hive.ql.parse.SemanticException: Another process is rebuilding the materialized view view_name (state=42000, code=40000)
- The fix ensures that the materialized view rebuild lock is
removed when a rebuild transaction is aborted. The
MaterializationRebuildLockHeartbeater
now checks the transaction
state before heartbeating, allowing outdated locks to be cleaned properly.Apache
Jira: HIVE-28416
- CDPD-78166: Residual operator tree in shared work optimizer
causes dynamic partition pruning errors
- 7.3.1.100
- Shared work optimizer left unused operator trees that sent
dynamic partition pruning events to non-existent operators. This caused query failures
when processing these events, leading to errors in building the physical operator
tree.
- The issue was addressed by ensuring that any residual unused
operator trees are removed during the operator merge process in shared work optimizer,
preventing invalid dynamic partition pruning event processing.
Apache Jira:
HIVE-28484
- CDPD-78113: Conversion failure from RexLiteral to ExprNode for
empty strings
- 7.3.1.100
- Conversion from RexLiteral to ExprNode failed when the literal
was an empty string, causing the cost-based optimizer to fail for queries.
- The issue was addressed by ensuring that an empty string literal
in a filter produces a valid RexNode, preventing cost-based optimizer
failures.
Apache Jira: HIVE-28431
Cloudera Runtime 7.3.1🔗
- CDPD-57121: ThreadPoolExecutorWithOomHook handling
OutOfMemoryError
- 7.3.1
- The
ThreadPoolExecutorWithOomHook
wasn't
effectively handling OutOfMemoryError
when executing tasks, as the
exception was wrapped in ExecutionException
, making it harder to
detect.
- The issue was fixed by updating ThreadPoolExecutorWithOomHook to
properly invoke OutOfMemoryError hooks and stop HiveServer2 when required.
Apache
Jira:
HIVE-24411, HIVE-26955, IMPALA-8518
- CDPD-31172: Hive: Intermittent ConcurrentModificationException
in HiveServer2 during mondrian testset
- Fixed an exception by using ConcurrentHashMap instead of HashMap
to avoid the race condition between threads occurring because of concurrent modification
of PerfLogger endTimes/startTimes maps.