Review the list of Hive issues that are resolved in Cloudera Runtime
7.3.1, its service packs and cumulative hotfixes.
Cloudera Runtime 7.3.1.400 SP2:
- CDPD-81766: Database Setting Consistency in Spark3 HWC
- Spark3's Hive Warehouse Connector (HWC) did not consistently
apply the database setting when validating if a table existed during append mode writes.
This led to inconsistencies where the database setting was not used for validation, even
though data was correctly written to the intended database.
- This issue was resolved and is now in a patch-ready state. This
ensures the database setting is consistently applied during table validation in Spark3
HWC, preventing prior inconsistencies.
- CDPD-81122: Enhanced Concurrent Access in HWC Secure Mode
- Spark applications running multiple concurrent queries in HWC's
SECURE_ACCESS
mode encountered failures and correctness problems.
This happened because the system faced difficulties when generating temporary table
names and managing staging directories simultaneously for multiple reads.
- This issue was addressed by improving the handling of concurrent
operations within HWC's
SECURE_ACCESS
mode.
- CDPD-81453: Efficient Handling of Timed-Out Transactions in
Replication
- Hive replication did not log transactions that timed out as
'ABORTED'. This caused these transactions to remain on the target cluster for an
extended period.
- This issue was resolved by ensuring that transactions aborted
due to timeout are now properly logged. This allows their abort event to be replicated,
leading to prompt removal from the target environment.
Apache Jira: HIVE-27797
- CDPD-81420: Table Filtering for Ranger Policies
- Ownership details for tables were not correctly carried through
the system during filtering, which prevented Ranger from applying policies based on who
owned the tables.
- This issue was resolved by ensuring that ownership information
is now consistently included when tables are filtered. This allows Ranger to accurately
enforce policies based on table ownership, leading to improved performance when
filtering databases and tables.
- CDPD-77626: Improving performance of ALTER PARTITION operations
using direct SQL
- Running
ALTER PARTITION
operations using direct
SQL failed for some databases. The failures occurred due to missing data type
conversions for CLOB and Boolean fields, causing the system to fall back to slower ORM
(Object Relational Mapping) paths.
- The issue was addressed by adding proper handling for CLOB and
Boolean type conversions. With this fix,
ALTER PARTITION
operations now
run successfully using direct SQL.Apache Jira: HIVE-28271, HIVE-27530
Cloudera Runtime 7.3.1.300 SP1 CHF 1
- CDPD-64950: Deadlock during Spark shutdown due to duplicate
transaction cleanup
- 7.3.1.300
- During Spark application shutdown, transactions were being
closed by two separate mechanisms at the same time. This parallel cleanup could result
in a deadlock, especially when the heartbeat interval was set to a low value.
- The issue was addressed by ensuring that transaction cleanup
occurs through a single mechanism during shutdown, avoiding concurrent execution and
potential deadlocks.
- CDPD-78334: Support custom delimiter in
SkippingTextInputFormat
- 7.3.1.300
- Queries like
SELECT COUNT(*)
returned wrong
results when a custom record delimiter was used. The input file was read as a single
line because the custom delimiter was ignored.
- The issue was addressed by ensuring that the custom record
delimiter is considered while reading the file, so that queries work as
expected.
Apache Jira: HIVE-27498
- CDPD-79237: Hive Metastore schema upgrade fails due to NULL
values
- 7.3.1.300
- Upgrading from CDP Private Cloud Base 7.1.7.2052 to 7.1.9.1010
fails during the Hive Metastore schema upgrade. The upgrade script issues the following
command:
ALTER TABLE "DBS" ALTER COLUMN "TYPE" SET DEFAULT 'NATIVE', ALTER COLUMN "TYPE" SET NOT NULL;
This
fails because the DBS.TYPE
column contains NULL values. These NULLs are
introduced by canary databases created by Cloudera Manager, which
insert entries in the HMS database without setting the TYPE.
- The issue was addressed by ensuring that canary databases
created by Cloudera Manager correctly populate the TYPE column in the
DBS
table, preventing NULL values and allowing the schema upgrade to
proceed.
Cloudera Runtime 7.3.1.200 SP1
- CDPD-78342/CDPD-72605: Optimized partition authorization in
HiveMetaStore to reduce overhead
- 7.3.1.200
- The
add_partitions()
API in
HiveMetastore was authorizing both new and existing partitions, leading to unnecessary
processing and increased load on the authorization service.
- The issue was addressed by modifying the
add_partitions()
API to authorize only new partitions, improving
performance and reducing authorization overhead.
- CDPD-77990: Upgraded MySQL Connector/J to 8.2.0 to fix
CVE-2023-22102
- 7.3.1.200
- The existing MySQL Connector/J version was vulnerable to
CVE-2023-22102.
- The issue was addressed by upgrading mysql-connector-j to
version 8.2.0 in packaging/src/docker/Dockerfile.
- CDPD-62654/CDPD-77985: Hive Metastore now sends a single
AlterPartitionEvent for bulk partition updates
- 7.3.1.200
- HiveMetastore previously sent individual
AlterPartitionEvent for each altered partition, leading to inefficiencies and pressure
on the back db.
- The issue was addressed by modifying Hive Metastore to send a
single AlterPartitionEvents containing a list of partitions for bulk updates,
hive.metastore.alterPartitions.notification.v2.enabledto turn on
this feature.
Apache Jira:HIVE-27746
- CDPD-73669: Secondary pool connection starvation caused by
updatePartitionColumnStatisticsInBatch API
- 7.3.1.200
- Hive queries intermittently failed with
Connection is
not available, request timed out
errors. The issue occurred because the
updatePartitionColumnStatisticsInBatch method in ObjectStore used
connections from the secondary pool, which had a pool size of only two, leading to
connection starvation.
- The fix ensures that the
updatePartitionColumnStatisticsInBatch API now requests connections
from the primary connection pool, preventing connection starvation in the secondary pool.
Apache Jira:
HIVE-28456
- CDPD-61676/CDPD-78341: Drop renamed external table fails due to
missing update in PART_COL_STATS
- 7.3.1.200
- When hive.metastore.try.direct.sql.ddl is set to false, dropping
an external partitioned table after renaming it fails due to a foreign key constraint
error in the
PART_COL_STATS
table. The table name in
PART_COL_STATS
is not updated during the rename, causing issues
during deletion.
- The issue was addressed by ensuring that the
PART_COL_STATS
table is updated during the rename operation, making
partition column statistics usable after the rename and allowing the table to be dropped
successfully.Apache Jira: HIVE-27539
- CDPD-79469: Selecting data from a bucketed table with a decimal
column throws NPE
- 7.3.1.200
- When hive.tez.bucket.pruning is enabled,
selecting data from a bucketed table with a decimal column type fails with a
NullPointerException
. The issue occurs due to a mismatch in decimal
precision and scale while determining the bucket number, causing an overflow and
returning null.
- The issue was addressed by ensuring that the correct decimal
type information is used from the actual field object inspector instead of the default
type info, preventing the overflow and
NullPointerException
.Apache Jira: HIVE-28076
- CDPD-74095: Connection timeout while inserting Hive partitions
due to secondary connection pool limitation
- 7.3.1.200
- Since HIVE-26419, Hive uses a secondary connection pool (size 2)
for schema and value generation. However, this pool also handles nontransactional
connections, causing the
updatePartitionColumnStatisticsInBatch
request
to fail with a Connection is not available, request timed out
error
when the pool reaches its limit during slow insert or update operations.
- The issue was addressed by ensuring that time-consuming API
requests use the primary connection pool instead of the secondary pool, preventing
connection exhaustion.
Apache Jira: HIVE-28456
- CDPD-78331: HPLSQL built-in functions fail in insert
statement
- 7.3.1.200
- After the HIVE-27492 fix, some HPLSQL built-in functions like
trim and lower stopped working in INSERT statements. This happened because UDFs already
present in Hive were removed to avoid duplication, but HPLSQL's local and offline modes
still required them.
- The issue was addressed by restoring the removed UDFs in HPLSQL
and fixing related function issues to ensure compatibility in all execution
modes.
Apache Jira: HIVE-28143
- CDPD-78343: Syntax error in HPL/SQL error handling
- 7.3.1.200
- In HPL/SQL, setting hplsql.onerror using
the SET command resulted in a syntax error because the grammar file (Hplsql.g4) only
allowed identifiers without dots (.).
- The issue was addressed by updating the grammar to support
qualified identifiers, allowing the SET command to accept dot (.) notation.
Example: EXECUTE 'SET hive.merge.split.update=true';
Apache Jira:
HIVE-28253
- CDPD-78330: HPL/SQL built-in functions like sysdate not
working
- 7.3.1.200
- HPL/SQL built-in functions that are not available in Hive, such
as sysdate, were failing with a SemanticException when used in queries. Only functions
present in both HPL/SQL and Hive were working.
- The issue was addressed by modifying the query parsing logic.
Now, HPL/SQL built-in functions are executed directly, and only functions also available
in Hive are forwarded to Hive for execution.
Apache Jira: HIVE-27492
- CDPD-78345: Signalling CONDITION HANDLER is not working in
HPLSQL
- 7.3.1.200
- The user-defined
CONDITION HANDLER
s in HPLSQL
are not being triggered as expected. Instead of running the handlers, the system only
logs the conditions, so the handlers aren't available when needed.
- The issue was addressed by ensuring that user-defined condition
handlers are properly registered and invoked when a SIGNAL statement raises a
corresponding condition.
Apache Jira: HIVE-28215
- CDPD-78333: EXECUTE IMMEDIATE throwing ClassCastException in
HPL/SQL
- 7.3.1.200
- When executing a
select count(*)
query, it
returns a long value, but HPLSQL expects a string. This mismatch causes the following
error:Caused by: java.lang.ClassCastException: class java.lang.Long cannot be cast to class java.lang.String
at org.apache.hive.service.cli.operation.hplsql.HplSqlQueryExecutor$OperationRowResult.get
- The issue was addressed by converting the result to a string
when the expected type is a string.
Apache Jira: HIVE-28215
- CDPD-79844: EXECUTE IMMEDIATE displaying error despite
successful data load
- 7.3.1.200
- Running
EXECUTE IMMEDIATE 'LOAD DATA INPATH
''/tmp/test.txt'' OVERWRITE INTO TABLE test_table'
displayed an error on the
console, even though the data was successfully loaded into the table. This occurred
because HPL/SQL attempted to check the result set metadata after execution, but LOAD
DATA queries do not return a result set, leading to a
NullPointerException
.
- The issue was addressed by ensuring that result set metadata is
accessed only when a result set is present.
Apache Jira: HIVE-28766
- CDPD-67033: HWC for Spark 3 compatibility with Spark 3.5
- 7.3.1.200
- The Spark 3.5, based on Cloudera on cloud 7.2.18 libraries, caused a failure in the
HWC for Spark 3 build. Canary builds indicate that broke compatibility.
- The issue was addressed by updating HWC for Spark 3 to align
with Spark 3.5 changes and ensuring compatibility with Cloudera on cloud 7.2.18 dependencies
- CDPD-80097: Datahub recreation fails due to Hive Metastore
schema validation error
- 7.3.1.200
- Datahub recreation on Azure fails because Hive Metastore schema
validation cannot retrieve the schema version due to insufficient permissions on the
VERSION
table.
- This issue is now fixed.
Cloudera Runtime 7.3.1.100 CHF 1
- CDPD-74456: Spark3 hwc.setDatabase() writes to the correct
database
- 7.3.1.100
- When setting the database using hive.setDatabase("DB") and
performing CREATE TABLE or write operations with Hive Warehouse Connector (HWC), the
operations were executed in a default database. This issue is now resolved and the
operations are executed in the correct database.
- The issue is now fixed.
- CDPD-74373: Timestamp displays incorrectly in Spark HWC with
JDBC_READER mode
- 7.3.1.100
- When using Spark HWC with JDBC_READER mode, timestamps were
displayed incorrectly. For example, 0001-01-01 00:00:00.0 was interpreted as 0000-12-30
00:00:00.
- This issue is addressed by correcting timestamp handling in
JDBC_READER mode to ensure accurate representation of timestamps before the Gregorian
calendar was adopted.
- CDPD-76932: Incorrect query results due to TableScan merge in
shared work optimizer
- 7.3.1.100
- During shared work optimization, TableScan operators were merged
even when they had different Dynamic Partition Pruning (DPP) parent operators. This
caused the filter from the missing DPP operator to be ignored, leading to incorrect
query results.
- This issue is resolved by modifying the shared work optimizer to
check the parents of TableScan operators and skip merging when DPP edges
differ.
Apache Jira: HIVE-26968
- CDPD-78115: Thread safety issue in
HiveSequenceFileInputFormat
- 7.3.1.100
- Concurrent queries returned incorrect results when query result
caching was disabled due to a thread safety issue in HiveSequenceFileInputFormat.
- This issue is now resolved and the files are now set in a
thread-safe manner to ensure correct query results.
- CDPD-78129: Materialized view rebuild failure due to stale
locks
- 7.3.1.100
- If a materialized view rebuild is aborted, the lock entry in the
materialization_rebuild_locks table is not removed. This prevents subsequent rebuilds of
the same materialized view, causing
error
Error: Error while compiling statement: FAILED: SemanticException
org.apache.hadoop.hive.ql.parse.SemanticException: Another process is rebuilding the materialized view view_name (state=42000, code=40000)
- The fix ensures that the materialized view rebuild lock is
removed when a rebuild transaction is aborted. The
MaterializationRebuildLockHeartbeater
now checks the transaction
state before heartbeating, allowing outdated locks to be cleaned properly.Apache
Jira: HIVE-28416
- CDPD-78166: Residual operator tree in shared work optimizer
causes dynamic partition pruning errors
- 7.3.1.100
- Shared work optimizer left unused operator trees that sent
dynamic partition pruning events to non-existent operators. This caused query failures
when processing these events, leading to errors in building the physical operator
tree.
- The issue was addressed by ensuring that any residual unused
operator trees are removed during the operator merge process in shared work optimizer,
preventing invalid dynamic partition pruning event processing.
Apache Jira:
HIVE-28484
- CDPD-78113: Conversion failure from RexLiteral to ExprNode for
empty strings
- 7.3.1.100
- Conversion from RexLiteral to ExprNode failed when the literal
was an empty string, causing the cost-based optimizer to fail for queries.
- The issue was addressed by ensuring that an empty string literal
in a filter produces a valid RexNode, preventing cost-based optimizer
failures.
Apache Jira: HIVE-28431
Cloudera Runtime 7.3.1
- CDPD-13406: Disable TopN in ReduceSinkOp when TopNKey is
introduced
- 7.3.1
-
When both the ReduceSink
and TopNKey
operators are
used together in a query, they both perform Top-N key filtering. This results in the
same filtering logic being applied twice, causing slower query execution in query
execution.
- The Top-N key filtering logic within the
ReduceSink
operator is now disabled when the TopNKey
operator is introduced. The patch ensures that only the TopNKey
operator handles the Top-N filtering, while the other functionalities of the
ReduceSink
operator remain unaffected.Apache Jira:
HIVE-23736
- CDPD-28339: Skip extra work in Cleaner when queue is empty
- 7.3.1
- The Cleaner previously made unnecessary database calls and
logged activities even when there were no candidates for cleaning.
- This was optimized by skipping the extra DB calls and logging
when the cleaning queue is empty, improving performance.
Apache Jira:
HIVE-24754
- CDPD-28174: Compaction task reattempt fails due to
FileAlreadyExistsException
- 7.3.1
- The issue arises when a compaction task is relaunched after the
first attempt fails, leaving behind temporary directories. The second attempt encounters
a
FileAlreadyExistsException
because the _tmp
directory created during the first attempt was not cleared.
- The solution ensures that compaction reattempts clear the old
files from previous attempts before starting, preventing the failure caused by stale
directories.
Apache Jira:
HIVE-24882, HIVE-23058
- CDPD-45285: Incorrect results for IN UDF on Parquet columns of
CHAR/VARCHAR type
- 7.3.1
- Queries with case statements and multiple conditions return
incorrect results for tables in Parquet format, particularly with
CHAR/VARCHAR
types. The issue is not observed with ORC or TextFile
formats and can be bypassed by setting hive.optimize.point.lookup
to false
.
- The issue was addressed by adding the necessary CASTs during
IN
clause conversion.Apache Jira:
HIVE-26320
- CDPD-24412: Compaction queue entries stuck in 'ready for
cleaning' state
- 7.3.1
- When multiple compaction tasks run simultaneously on the same
table, only one task removes obsolete files while others remain in the
ready for
cleaning
state, leading to an accumulation of queue entries.
- Add a mechanism to automatically clear or re-evaluate entries
stuck in the
ready for cleaning
state to improve compaction task
efficiency.Apache Jira:
HIVE-25115
- CDPD-27291: getCrossReference fails when retrieving constraints
from the primary key side
- 7.3.1
- When retrieving constraints from the primary key side, the
foreign key is passed as null, causing the operation to fail with a
Db name
cannot be null
exception, especially when the metadata cache is enabled by
default.
- This has been resolved by ensuring that the foreign key
constraint is correctly handled even when passed as null during constraint retrieval
from the primary key side.
- CDPD-15269: Add caching support for frequently called constraint
APIs in catalogd's HMS interface
- 7.3.1
- The
get_unique_constraints
,
get_primary_keys
, get_foreign_keys
, and
get_not_null_constraints
APIs are called frequently during query
compilation, particularly with TPCDS queries. Without caching, this leads to performance
overhead.
- Introduced caching for the above APIs in the Catalogd’s HMS
interface by adding ValidWriteIdList and tableId to the API requests. This ensures that
the cache or backing DB is appropriately used to serve responses.
Apache Jira:
HIVE-23931
- DWX-8663: ShuffleScheduler should report the original exception
when shuffle becomes unhealthy
- 7.3.1
- The
ShuffleScheduler
does not report the
original exception when the shuffle becomes unhealthy, making it harder to diagnose the
underlying issue.
- This issue is now fixed.
Apache Jira:
TEZ-4342
- CDPD-43837: MSSQL upgrade scripts fail when adding TYPE column
to DBS table
- 7.3.1
- Schema upgrade for MSSQL fails with an error when trying to add
the
TYPE
column to the DBS
table due to the incorrect
usage of the keyword NATIVE
in the default value.
- The issue was addressed by modifying the schema upgrade script
to use a valid constant expression for the default value in MSSQL.
Apache Jira:
HIVE-25551
- CDPD-43890: Drop data connector if not exists should not throw
an exception
- 7.3.1
- The
DROP DATA CONNECTOR IF NOT EXISTS
command
incorrectly throws a NoSuchObjectException
when the connector does not
exist.
- The issue was addressed by ensuring that no exception is thrown
if the
ifNotExists
flag is true during the drop operation.Apache
Jira:
HIVE-26299
- CDPD-43838: Filter out results for show connectors in Hive
Metastore client side
- 7.3.1
- The
SHOW CONNECTORS
command does not filter
results based on authorization, such as Ranger policies, on the client side.
- The issue was addressed by implementing client-side filtering in
HMS to ensure that only connectors authorized by policies like Ranger are
displayed.
Apache Jira:
HIVE-26246
- CDPD-43952: HMS get_all_tables method does not retrieve tables
from remote database
- 7.3.1
- The
get_all_tables method
in Hive Metastore
handler only retrieves tables from the native database, unlike the get_tables
method
, which can retrieve tables from both native and remote databases.
- The issue was addressed by updating the
get_all_tables
method to retrieve tables from both native and remote
databases, ensuring consistency with the get_tables method.Apache Jira:
HIVE-26171
- CDPD-55914: Select query on table with remote database returns
NULL values with postgreSQL and Redshift data connectors
- 7.3.1
- Few datatypes are not mapped from Postgres or Redshift to Hive
data types in the connector, which resulted in displaying null values for the columns of
those data types.
- This issue is fixed.
Apache Jira:
HIVE-27316
- CDPD-31726: Prevent NullPointerException by Checking Collations
Return Value
- 7.3.1
- A
NullPointerException
occurs during execution
of an EXPLAIN cbo on a subquery when using Tez as the execution engine, leading to empty
explain output.
- Added a check for null return values from
RelMetadataQuery.collations() to prevent
NullPointerExceptions
in
RelFieldTrimmer and HiveJoin, ensuring stability during query execution.Apache
Jira:
HIVE-25749
- CDPD-27418: Incorrect row order after query-based MINOR
compaction
- 7.3.1
- The query-based
MINOR
compaction used an
incorrect sorting order, which led to duplicated rows after multiple merge
statements.
- The sorting order was corrected, ensuring proper row
handling.
Apache Jira:
HIVE-25258
- CDPD-27419: Incorrect row order validation for query-based major
compaction
- 7.3.1
- The row order validation for query-based
MAJOR
compaction incorrectly checked the order as bucketProperty, leading to failures with
multiple bucketProperties.
- The validation was updated to correctly check the order as
originalTransactionId, bucketProperty, and rowId, and an improved error message was
implemented.
Error: org.apache.hadoop.hive.ql.metadata.HiveException: Wrong sort order of Acid rows detected for the rows
Apache Jira:
HIVE-25257
- Enable proper handling of non-default schemas in Hive for JDBC
databases
- 7.3.1
- Hive fails to create an external table for a JDBC database when
the table is in a non-default schema, causing
PSQLException
error that
the table does not exist.
- Improved handling of tables in non-default schemas by correctly
using the hive.sql.schema property. This ensures the table is
found, preventing the error.
Apache Jira:
HIVE-25591
- CBO failure when using JDBC table with password through
dbcp.password.uri
- 7.3.1
- When a table is created using
JDBCStorageHandler
and the JDBC_PASSWORD_URI
is
specified for the password, the Cost-Based Optimizer (CBO) fails. This causes all the
data to be fetched directly from the database and processed in Hive, impacting
performance.
- Adjustments were made to ensure CBO functions correctly when
JDBC_PASSWORD_URI
is used, allowing for proper optimization and
preventing unnecessary data fetch from the database.Apache Jira:
HIVE-25626
- CDPD-28904: Intermittent Hive JDBC SSO failures in virtual
environments
- 7.3.1
- Browser-based SSO with the Hive JDBC driver fails in virtual
environments (like Windows VMs). The driver sometimes misses POST requests with the SAML
token due to a race condition, causing authentication failures.
- Resolved a race condition in the JDBC driver to ensure it
properly handles SSO authentication in virtual environments, preventing POST request
failures.
Apache Jira:
HIVE-25479
- CDPD-43672: Remove unnecessary optimizations in
canHandleQbForCbo
- 7.3.1
- The
canHandleQbForCbo()
includes an
optimization where it returns an empty string if INFO
logging is
disabled, which complicates the logic and doesn't significantly impact performance.
- The issue was addressed by simplifying the code in
canHandleQbForCbo()
and removing the unnecessary optimization related
to logging.Apache Jira:
HIVE-26438
- DWX-7648: Infinite loop during CBO parsing cause OOM in
HiveServer2
- 7.3.1
- HiveServer became unstable due to an infinite loop during query
parsing with
UNION
operations, causing an out-of-memory error the
during cost-based and logical optimization phase. The issue occurred because Hive's
custom metadata provider was not initialized.
- The initialization has now been moved before CBO requires
it.
Apache Jira:
HIVE-25220
- CDPD-31200: Reader not closed after check in AcidUtils, leading
to resource exhaustion
- 7.3.1
- The Reader in
AcidUtils.isRawFormatFile
is not
being closed after the check is finished. This causes issues when resources on the
DFSClient are limited, leading to connection pool timeouts such as Timeout
waiting for connection from pool
.
- The fix includes automatically closing the Reader in
AcidUtils.isRawFormatFile
, which ensures that resources are freed up
and prevents connection pool timeout issues.Apache Jira:
HIVE-25683
- DWX-10336: SSL certificate import error in HiveServer2 with JWT
authentication
- 7.3.1
- JWT support for HiveServer, SSL certificate import fails due to
self-signed certificates not being accepted by the JVM in environments. The error occurs
during the initialization of the HTTP server.
- The fix includes introducing a property to disable SSL
certificate verification for downloading JWKS (JSON Web Key Set) in environments. This
helps users bypass certificate validation.
Apache Jira:
HIVE-26425
- CDPD-42686: Query-based compaction fails for tables with complex
data types and reserved keywords
- 7.3.1
- Query-based compaction fails on tables with complex data types
and columns containing reserved keywords due to incorrect quoting of column names when
creating a temporary table.
- The issue was addressed by ensuring that columns with reserved
keywords are correctly quoted during the creation of temporary tables.
Apache
Jira:
HIVE-26374
- CDPD-54605: HiveSchemaTool to honor metastore-site.xml for
initializing metastore schema
- 7.3.1
- The HiveSchemaTool fails to recognize the
metastore-site.xml
configuration when initializing the metastore
schema. It defaults to using an embedded database instead of the specified MySQL
database.
- The issue was addressed by updating the HiveSchemaTool to ensure
it properly reads the
metastore-site.xml
file, allowing for correct
initialization of the metastore schema with the intended database
configuration.Apache Jira:
HIVE-26402
- CDPD-55135: JDBC data connector queries to avoid exceptions at
CBO stage
- 7.3.1
- JDBC data connector queries throw exceptions at the CBO stage
due to incorrect handling of database, schema and table names. When querying, the
database name is improperly swapped with the schema name, leading to error:
schema dev does not exist
- The issue was addressed by changing the
hive.sql.table property value from databasename.tablename to
tablename and adding hive.sql.table property with databasename.
This adjustment ensures that the CBO stage retrieves JDBC table information correctly,
eliminating the errors related to schema and table name resolution.
Apache
Jira:
HIVE-26192
- DWX-8296: Hive query vertex failure due to Kerberos
authentication error
- 7.3.1
- Hive queries fail during LLAP execution with vertex failures.
The query-executor fails to communicate with the query-coordinator due to an
authentication error in Kerberos.
- Address the Kerberos authentication failure between
query-executors and the query-coordinator to ensure proper task execution and prevent
vertex failures during LLAP execution.
- CDPD-45607: Atomic schema upgrades for HMS to prevent partial
commits
- 7.3.1
- SchemaTool may leave the metastore in an invalid state during
schema upgrades because each change is autocommitted. If an upgrade fails mid-process,
the schema is left partially updated, causing issues on reruns.
- The issue was addressed by ensuring schema changes are committed
only after the entire upgrade process completes successfully. If any step fails, no
changes are applied, preventing partial updates and keeping the schema
intact.
Apache Jira:
HIVE-25707
- CDPD-49232: Auto-reconnect data connectors after timeout
- 7.3.1
- When data connectors remain idle for long, the JDBC connection
times out. This requires a restart to re-establish the connection, rendering the
connector unusable until then.
- The issue was addressed by automatically checking if a
connection is closed and re-establishing it when necessary. This ensures the connectors
stay functional without needing a restart, and includes setting connection timeout and
retry properties for more reliable reconnections.
Apache Jira:
HIVE-26045
- CDPD-49494: Allow JWT and LDAP authentication to co-exist in
HiveServer2 configuration
- 7.3.1
- Setting
hive.server2.authentication=JWT,LDAP fails with a validation
error, preventing HiveServer2 from starting due to conflicts between authentication
types.
- The issue was addressed by updating the validation logic to
support JWT authentication alongside LDAP, ensuring HiveServer2 can start with both auth
mechanisms enabled.
Apache Jira:
HIVE-26045
- CDPD-40732: Timestamps when reading parquet files with
Vectorized reader
- 7.3.1
- Timestamp shifts occur when reading Parquet files that were
created in older Hive versions and vectorized execution is enabled. The vectorized
reader is not able to exploit the metadata inside the Parquet file to apply the correct
conversion. For instance, a timestamp written as 1880-01-01 00:00:00 may be read as
1879-12-31 23:52:58; the exact shift depends on the JVM timezone. The non-vectorized
reader is not affected.
- The fix ensures both vectorized and non-vectorized readers use
the same logic to determine the correct timestamp conversion based on metadata and
configuration.
Apache Jira:
HIVE-26270