Known issues and technical limitations for Hive are addressed in Cloudera Runtime 7.3.2, its service packs, and cumulative hotfixes.
Known issues identified in Cloudera Runtime 7.3.2
- CDPD-73781: High operational costs from automatic partition discovery on cloud storage
- 7.3.1 and its SPs and CHFs, 7.2.18 and its SPs
- In cloud environments, the Partition Management Task (PMT) could significantly increase operational costs and performance overhead for large tables with many partitions.
- To address this issue, you can perform the following:
- Increase the value of the metastore.partition.management.task.frequency property to 86400 seconds.
- Disable partition discovery manually for specific tables incurring high costs by using the discover.partitions table property.
- CDPD-93090: Failure when inserting data into partitions located on a different filesystem than the table
- 7.3.2
- Inserting data into a table partition results in a
MoveTask execution error if the partition is stored in a different filesystem than the parent table. For example, if you create an external table in S3 but add a partition with an HDFS location, the INSERT operation fails with a Wrong FS error. Although the data is physically inserted into the correct location, the query returns a non-zero exit code and displays an error message.
- None
- CDPD-92560: Partition insertion errors across multiple buckets in RAZ enabled clusters
- 7.3.2
- Inserting data into a table partition results in a
MoveTask execution error when the partition is located in a different S3 bucket than the base table. In Ranger Access Control (RAZ) enabled Private Cloud clusters, the INSERT operation fails with a Wrong FS error, even though the data is physically written to the specified partition location.
- None
- CDPD-74192: Hive SSL certificate error during large record copies
- 7.3.1 and its SPs and CHFs, 7.2.18.300,
7.2.18.400
- When you copy a large number of records (approximately 1 million or more) using Hive, the operation might fail with a
javax.net.ssl.SSLException: org.bouncycastle.tls.TlsFatalAlert: certificate_unknown(46)
error. This issue occurs during the job commit phase when Hive attempts to execute HTTP requests to S3 for file operations such as copyFile.
- None
- CDPD-68096: Unable to set S3 credentials at the session level in Hive
- 7.3.2
- Accessing S3 buckets from Hive on on premises results in a MetaException or
AccessDeniedException when you attempt to create external tables by
using session-level credentials. While Spark and other Hadoop jobs allow you to define
fs.s3a.access.key and fs.s3a.secret.key at
the session level, Hive Metastore (HMS) does not honor these session-specific parameters
during path validation.
- To access S3 buckets, you must configure the S3 credentials at the global level in the Hive or Hadoop cluster configuration (such as core-site.xml) or use a supported credential provider like the Hadoop Credential Shell to store keys securely. Setting these parameters at the session level is currently not supported for Hive table creation.
HIVE-16913
Known issues identified before Cloudera Runtime 7.3.2
- DWX-22436: DL upgrade recovery fails due to Metastore schema
incompatibility
- 7.3.1.600
- When attempting a Data Lake (DL) upgrade recovery from
version 7.2.18.1100 to Cloudera Runtime 7.3.1.500, the process fails
because the Hive Metastore schema versions are incompatible. The error indicates a
mismatch between the Hive version (3.1.3000.7.3.1.500-182) and the database schema
version (3.1.3000.7.2.18.0-Update2). This blocks Data Lake recovery if an upgrade fails,
impacting customers.
- Before you initiate the recovery process, manually update
the Hive Metastore schema to match the target version by using the
schematool utility.
- Obtain the Hive database password: Run the following command to retrieve the
password from the pillar configuration:
cat /srv/pillar/postgresql/postgre.sls
- Back up the existing configuration: Move the current configuration directory to a
backup location:
mv /etc/hive/conf /etc/hive/conf_backup
mkdir /etc/hive/conf
- Prepare the temporary configuration: Copy the process files to the new
configuration directory:
scp /var/run/cloudera-scm-agent/process/<process-id>-hive-metastore-create-tables/* /etc/hive/conf/
- Update the connection password: Open the
/etc/hive/conf/hive-site.xml file and perform the following
modifications:
- Set the javax.jdo.option.ConnectionPassword property to
your Hive database password.
- Comment out the hadoop.security.credential.provider.path
property.
- Run the schema upgrade tool: Execute the schematool to
synchronize the version:
/opt/cloudera/parcels/CDH/lib/hive/bin/schematool -dbType postgres -initOrUpgradeSchema --verbose
- Restore the original configuration: Remove the temporary directory and restore
your backup:
rm -rf /etc/hive/conf
mv /etc/hive/conf_backup /etc/hive/conf
- Restart the cluster: Restart the services to initialize the Hive Metastore with
the updated schema.
- CDPD-77738: Atlas hook authorization issue causing
HiveCreateSysDb timeout
- 7.1.9 SP1 CHF4, 7.1.7 SP3
CHF7, 7.3.1.100, and its higher versions
- Atlas hook authorization error causes
HiveCreateSysDb command to time out due to repeated retries.
- None
- CDPD-74680: DAG not retried after failure
- 7.3.1 and its higher versions
- When executing a Hive query, if the ApplicationMaster container
fails, Hive does not retry the DAG if the failure message contains some diagnostic
information including a line break, leading to query failure (instead of retry).
- None