Known Issues in Apache Hive

Known issues and technical limitations for Hive are addressed in Cloudera Runtime 7.3.2, its service packs, and cumulative hotfixes.

Known issues identified in Cloudera Runtime 7.3.2

CDPD-73781: High operational costs from automatic partition discovery on cloud storage
7.3.1 and its SPs and CHFs, 7.2.18 and its SPs
In cloud environments, the Partition Management Task (PMT) could significantly increase operational costs and performance overhead for large tables with many partitions.
To address this issue, you can perform the following:
  • Increase the value of the metastore.partition.management.task.frequency property to 86400 seconds.
  • Disable partition discovery manually for specific tables incurring high costs by using the discover.partitions table property.
CDPD-93090: Failure when inserting data into partitions located on a different filesystem than the table
7.3.2
Inserting data into a table partition results in a MoveTask execution error if the partition is stored in a different filesystem than the parent table. For example, if you create an external table in S3 but add a partition with an HDFS location, the INSERT operation fails with a Wrong FS error. Although the data is physically inserted into the correct location, the query returns a non-zero exit code and displays an error message.
None
CDPD-92560: Partition insertion errors across multiple buckets in RAZ enabled clusters
7.3.2
Inserting data into a table partition results in a MoveTask execution error when the partition is located in a different S3 bucket than the base table. In Ranger Access Control (RAZ) enabled Private Cloud clusters, the INSERT operation fails with a Wrong FS error, even though the data is physically written to the specified partition location.
None
CDPD-74192: Hive SSL certificate error during large record copies
7.3.1 and its SPs and CHFs, 7.2.18.300, 7.2.18.400
When you copy a large number of records (approximately 1 million or more) using Hive, the operation might fail with a
javax.net.ssl.SSLException: org.bouncycastle.tls.TlsFatalAlert: certificate_unknown(46)
error. This issue occurs during the job commit phase when Hive attempts to execute HTTP requests to S3 for file operations such as copyFile.
None
CDPD-68096: Unable to set S3 credentials at the session level in Hive
7.3.2
Accessing S3 buckets from Hive on on premises results in a MetaException or AccessDeniedException when you attempt to create external tables by using session-level credentials. While Spark and other Hadoop jobs allow you to define fs.s3a.access.key and fs.s3a.secret.key at the session level, Hive Metastore (HMS) does not honor these session-specific parameters during path validation.
To access S3 buckets, you must configure the S3 credentials at the global level in the Hive or Hadoop cluster configuration (such as core-site.xml) or use a supported credential provider like the Hadoop Credential Shell to store keys securely. Setting these parameters at the session level is currently not supported for Hive table creation.

HIVE-16913

Known issues identified before Cloudera Runtime 7.3.2

DWX-22436: DL upgrade recovery fails due to Metastore schema incompatibility
7.3.1.600
When attempting a Data Lake (DL) upgrade recovery from version 7.2.18.1100 to Cloudera Runtime 7.3.1.500, the process fails because the Hive Metastore schema versions are incompatible. The error indicates a mismatch between the Hive version (3.1.3000.7.3.1.500-182) and the database schema version (3.1.3000.7.2.18.0-Update2). This blocks Data Lake recovery if an upgrade fails, impacting customers.
Before you initiate the recovery process, manually update the Hive Metastore schema to match the target version by using the schematool utility.
  1. Obtain the Hive database password: Run the following command to retrieve the password from the pillar configuration:
    cat /srv/pillar/postgresql/postgre.sls
  2. Back up the existing configuration: Move the current configuration directory to a backup location:
    mv /etc/hive/conf /etc/hive/conf_backup
                      mkdir /etc/hive/conf
  3. Prepare the temporary configuration: Copy the process files to the new configuration directory:
    scp /var/run/cloudera-scm-agent/process/<process-id>-hive-metastore-create-tables/* /etc/hive/conf/
  4. Update the connection password: Open the /etc/hive/conf/hive-site.xml file and perform the following modifications:
    • Set the javax.jdo.option.ConnectionPassword property to your Hive database password.
    • Comment out the hadoop.security.credential.provider.path property.
  5. Run the schema upgrade tool: Execute the schematool to synchronize the version:
    /opt/cloudera/parcels/CDH/lib/hive/bin/schematool -dbType postgres -initOrUpgradeSchema --verbose
  6. Restore the original configuration: Remove the temporary directory and restore your backup:
    rm -rf /etc/hive/conf
                      mv /etc/hive/conf_backup /etc/hive/conf
  7. Restart the cluster: Restart the services to initialize the Hive Metastore with the updated schema.
CDPD-77738: Atlas hook authorization issue causing HiveCreateSysDb timeout
7.1.9 SP1 CHF4, 7.1.7 SP3 CHF7, 7.3.1.100, and its higher versions
Atlas hook authorization error causes HiveCreateSysDb command to time out due to repeated retries.
None
CDPD-74680: DAG not retried after failure
7.3.1 and its higher versions
When executing a Hive query, if the ApplicationMaster container fails, Hive does not retry the DAG if the failure message contains some diagnostic information including a line break, leading to query failure (instead of retry).
None