Known Issues in Apache Hive
Learn about the known issues in Hive, the impact or changes to the functionality, and the workaround.
- OPSAPS-58664: Hive on Tez LDAP configurations are not pushed to hive-site.xml by Cloudera Manager
- After setting up LDAP properties in the Hive on Tez service, the settings are not pushed into hive-site.xml for Hive on Tez service even after a restart. The issue is due to HiveOnTezServiceHandler re-using definitions from HiveConfigFileDefinitions. The definitions are not including any roletypes other than HiveServiceHandler's roletypes.
- Hive query calling unix_timestamp method fails on CDP 7.1.7 SP1 | ERROR: ...IllegalArgumentException: Cannot create timestamp, parsing error
- While executing a SELECT command which uses the unix_timestamp method on certain columns fails with the below exception:
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.IllegalArgumentException: Cannot create timestamp, parsing error 0700-02-29 23:57:02
The issue is caused by converting the timestamp in a different way during reading than it was converted when writing the data into the table.- Hive always stores values in UTC. To achieve this there is usually a conversion from session time zone to UTC when writing data, and from UTC to the same timezone when reading the data.
- The conversion that is used when reading and writing the data should be the same
otherwise problems like the one mentioned here can occur. Currently, there are two
ways to do the conversions:
- the legacy one which uses the old Date/Time classes,
- the modern one which uses the newer JDK classes introduced in Java 8
- CDPD-26975: Using the ABFS / S3A connectors in an Oozie workflow where the operations are "secured" may trigger an IllegalArgumentException with the error message java.net.URISyntaxException: Relative path in absolute URI.
- Set the following XML configuration in the Datahub cluster's Cloudera Manager:
- In the Cloudera Manager Admin Console, go to the Oozie service.
- Click the Configuration tab.
- In the Oozie Server Advanced Configuration Snippet (Safety Valve) for oozie-site.xml
set the following:
- Set the following if you are using Amazon S3: <property> <name>oozie.service.HadoopAccessorService.fs.s3a</name> <value>fs.s3a.buffer.dir=/tmp/s3a</value> </property>
- Set the following if you are using ABFS: <property> <name>oozie.service.HadoopAccessorService.fs.abfs</name> <value>fs.azure.buffer.dir=/tmp/abfs</value> </property> <property> <name>oozie.service.HadoopAccessorService.fs.abfss</name> <value>fs.azure.buffer.dir=/tmp/abfss</value> </property>
- Enter a Reason for change, and then click Save Change to commit the changes.
- Restart the Oozie service.
- CDPD-29297: HWC + Oozie issue: Cannot create PoolableConnectionFactory. Currently, only Spark cluster mode is supported in the Oozie Spark Action with Hive Warehouse Connector (HWC).
- Use Spark action in cluster mode. <spark xmlns="uri:oozie:spark-action:1.0">...<mode>cluster</mode>...</spark>
- CDPD-29302: The Atlas lineage information is missing in case of HWC JDBC write.
- None
- OPSAPS-59928: INSERT INTO from SELECT using hive (hbase) table returns an error under certain conditions.
- Users who upgraded to a Kerberized CDP cluster from HDP and enabled AutoTLS have reported this problem. For more information, see Cloudera Community article: ERROR: "FAILED: Execution Error, return code 2" when the user is unable to issue INSERT INTO from SELECT using hive (hbase) table.
- CDPD-21365: Performing a drop catalog operation drops the catalog from the CTLGS table. The DBS table has a foreign key reference on CTLGS for CTLG_NAME. Because of this, the DBS table is locked and creates a deadlock.
- You must create an index in the DBS table on CTLG_NAME: CREATE INDEX CTLG_NAME_DBS ON DBS(CTLG_NAME);.
- CDPD-26229: Hive TPCDS test query is timing out after 3 hours.
- None.
- CDPD-26556 After an upgrade, querying a CTAS table under certain conditions might throw an exception
- If you upgrade your Hive cluster from CDH 6 to CDP 7, create a
CTAS table in the CDP cluster from a table you upgraded from CDH, you might see the
following exception when you query the new table:
class org.apache.hadoop.io.IntWritable cannot be cast to class org.apache.hadoop.hive.serde2.objectinspector.StandardUnionObjectInspector$StandardUnion
This issue involves CDH-based tables having columns of complex types ARRAY, MAP, and STRUCT.
- OPSAPS-54299 Installing Hive on Tez and HMS in the incorrect order causes HiveServer failure
- You need to install Hive on Tez and HMS in the correct order; otherwise, HiveServer fails. You need to install additional HiveServer roles to Hive on Tez, not the Hive service; otherwise, HiveServer fails.
- CDPD-23506: OutOfMemoryError in LLAP
- Long running spark-shell applications can leave sessions in interactive Hiveserver2 until the Spark application finishes (user exists from spark-shell), causing memory pressure in case of a high number of queries in the same shell (1000+).
- CDPD-23041: DROP TABLE on a table having an index does not work
- If you migrate a Hive table to CDP having an index, DROP TABLE
does not drop the table. Hive no longer supports indexes (HIVE-18448). A foreign key constraint on the indexed table prevents dropping the
table. Attempting to drop such a table results in the following error:
java.sql.BatchUpdateException: Cannot delete or update a parent row: a foreign key constraint fails ("hive"."IDXS", CONSTRAINT "IDXS_FK1" FOREIGN KEY ("ORIG_TBL_ID") REFERENCES "TBLS ("TBL_ID"))
- CDPD-20636 and DWX-6163: SHOW TABLES command does not produce a list of tables that are owned by the current user
- When you run the SHOW TABLES command against a Hive Virtual Warehouse, tables are only returned if you have explicit read or read/write access to the table, or if you belong to a group that has read or read/write access. If you only have access to the tables because you are the owner of the objects, you can query the table content, but the table names do not appear in the SHOW TABLES command output.
- CDPD-17766: Queries fail when using spark.sql.hive.hiveserver2.jdbc.url.principal in the JDBC URL to invoke Hive.
- Do not specify
spark.sql.hive.hiveserver2.jdbc.url.principal
in the JDBC URL to invoke Hive remotely.
- HIVE-24271: Problem creating an ACID table in legacy table mode
- In site-level, legacy CREATE TABLE mode, the CREATE MANAGED TABLE command might not work as expected to override the legacy behavior and create a managed ACID table. The command works only at the session level.
- CDPD-13636: Hive job fails with OutOfMemory exception in the Azure DE cluster
- Set the parameter hive.optimize.sort.dynamic.partition.threshold=0. Add this parameter in Cloudera Manager (Hive Service Advanced Configuration Snippet (Safety Valve) for hive-site.xml)
- CDPD-16802: Autotranslate assertion failure.
- The exception is not triggered when it is executed from Spark-Shell. This is from Hive in the getJdoFilterPushdownParam parameter of ExpressionTree.java, which checks the partition column as only String and not any other type.
- HiveServer Web UI displays incorrect data
- If you enabled auto-TLS for TLS encryption, the HiveServer2 Web UI does not display the correct data in the following tables: Active Sessions, Open Queries, Last Max n Closed Queries
- CDPD-11890: Hive on Tez cannot run certain queries on tables stored in encryption zones
- This problem occurs when the Hadoop Key Management Server (KMS) connection is SSL-encrypted and a self signed certificate is used. SSLHandshakeException might appear in Hive logs.
Technical Service Bulletins
- TSB 2022-567: Potential Data Loss due to CTLT HBaseStorageHandler failure dropping underlying HBase table while rollback
- If the
create table target_table like source table
command (CTLT) fails and the source table is HBaseStorageHandler-based table, the HBaseMetaHook rollback logic deletes the underlying HBase table, resulting in potential data loss. - Upstream JIRA
- HIVE-25989
- Knowledge article
- For the latest update on this issue, see the corresponding Knowledge article: TSB 2022-567: Potential Data Loss due to CTLT HBaseStorageHandler failure dropping underlying HBase table while rollback
- TSB 2022-600: Renaming translated external partition table shows empty records in Apache Hive
- If an Apache Hive partitioned table is renamed, it can cause data loss due to the
location being incorrectly translated at the Hive Metastore (HMS) translation layer in
the legacy config mode.
Scenario:
- The following configurations are set:
hive.create.as.external.legacy=true
hive.created.as.acid=true
- The following processes are executed:
- Creation of new partition table
- Data is loaded on new table
- Table is renamed
- Scan/view after rename of the same table returns empty records
Example:
- The following kind of query is
affected:
CREATE TABLE foo (i1 int) PARTITIONED BY (i2 string); INSERT INTO foo VALUES (1,’foo’); ALTER TABLE foo RENAME TO foo_renamed; SELECT * FROM foo_renamed; //returns empty records
- The following kind of query is not
affected:
CREATE EXTERNAL foo (i1 int) PARTITIONED BY (i2 string); INSERT INTO foo VALUES (1,’foo’); ALTER TABLE foo RENAME TO foo_renamed; SELECT * FROM foo_renamed; //returns 1 record
- The following configurations are set:
- Upstream JIRA
- HIVE-26158
- Knowledge article
- For the latest update on this issue, see the corresponding Knowledge article: TSB 2022-600: Renaming translated external partition table shows empty records in Apache Hive
- TSB 2023-627: IN/OR predicate on binary column returns wrong result
- An IN or an OR predicate involving a binary datatype column may
produce wrong results. The OR predicate is converted to an IN due to the setting
hive.optimize.point.lookup
which is true by default. Only binary data types are affected by this issue. See https://issues.apache.org/jira/browse/HIVE-26235 for example queries which may be affected. - Upstream JIRA
- HIVE-26235
- Knowledge article
- For the latest update on this issue, see the corresponding Knowledge article: TSB 2023-627: IN/OR predicate on binary column returns wrong result