Known Issues in Apache Hive

Learn about the known issues in Hive, the impact or changes to the functionality, and the workaround.

CDPD-43107: SemanticException for INSERT INTO statement when Hive Cost-based Optimizer (CBO) is disabled
If you are running the INSERT INTO query and hive.cbo.enable is set to "false", the query fails with a SemanticException. For example,
set hive.cbo.enable=false;

CREATE TABLE mytable (
id INT,
str STRING
);

INSERT INTO mytable (id, str) VALUES (1, 'a');
Output:
org.apache.hadoop.hive.ql.parse.SemanticException: 0:0 Expected 2 columns for insclause-0/default@mytable; select produces 1 columns. Error encountered near token ''a''

This issue occurs only when columns are specified for the target table in the query. INSERT INTO mytable values (1, 'a'); does not result in an exception.

Enable CBO and run the query.
CDPD-43957: HiveServer2 shuts down during replication due to high resource usage
During Hive replication (both bootstrap and incremental), you may notice that the HiveServer2 (HS2) shuts down periodically with the following error:
java.sql.SQLException: org.apache.hive.jdbc.ZooKeeperHiveClientException: Unable to read HiveServer2 configs from ZooKeeper
	at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:265)
	at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:107)
	at java.sql.DriverManager.getConnection(DriverManager.java:664)
	at java.sql.DriverManager.getConnection(DriverManager.java:247)
	at com.cloudera.enterprise.hive3qt.Hive3QueryTool$HiveOperation.execute(Hive3QueryTool.java:682)
	at com.cloudera.enterprise.hive3qt.Hive3QueryTool.main(Hive3QueryTool.java:935)
Caused by: org.apache.hive.jdbc.ZooKeeperHiveClientException: Unable to read HiveServer2 configs from ZooKeeper
	at org.apache.hive.jdbc.ZooKeeperHiveClientHelper.configureConnParams(ZooKeeperHiveClientHelper.java:177)
	at org.apache.hive.jdbc.Utils.configureConnParamsFromZooKeeper(Utils.java:580)
	at org.apache.hive.jdbc.Utils.parseURL(Utils.java:391)
	at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:263)
	... 5 more
Caused by: org.apache.hive.jdbc.ZooKeeperHiveClientException: Tried all existing HiveServer2 uris from ZooKeeper.
	at org.apache.hive.jdbc.ZooKeeperHiveClientHelper.getServerHosts(ZooKeeperHiveClientHelper.java:132)
	at org.apache.hive.jdbc.ZooKeeperHiveClientHelper.configureConnParams(ZooKeeperHiveClientHelper.java:172)
	... 8 mor

This issue occurs when you replicate at scale with an unbalanced cluster setup that has all the roles running on the Cloudera Manager host. As a result, Cloudera Manager ends up in a bottleneck situation because HS2 crashes, preventing further replication. The logs (/var/log/messages) indicate that there were a large number of Java processes running on the host, which exhausted the host's memory and triggered an Out Of Memory Killer process to stop HS2.

You can restart HS2 and reinitiate replication. However, the replication process may take a longer time to complete.

  • Follow these recommendations when setting up your source and target clusters:
    • Ensure that no HS2, Hive metastore (HMS), DataNode, or NameNode roles are running on the Cloudera Manager host.
    • Ensure that you have multiple instances of HS2 and HMS roles on different nodes.
    • It is recommended that you have the HS2 and HMS roles on different nodes than the DataNodes or NameNodes.

    These practices increase the possibility of Hive replication at scale completing successfully.

  • Limit the scale of Hive replication by temporarily disabling replication policies.
CDPD-40730: Parquet change can cause incompatibility
Parquet files written by the parquet-mr library in CDP 7.1.8, where the schema contains a timestamp with no UTC conversion will not be compatible with older versions of Parquet readers. The effect is that the older versions will still consider these timestamps as they would require UTC conversions and will thus end up with a wrong result. You can encounter this problem only when you write Parquet-based tables using Hive, and tables have the non-default configuration hive.parquet.write.int64.timestamp=true.
None.
Hive query calling unix_timestamp method fails on CDP 7.1.7 SP1 | ERROR: ...IllegalArgumentException: Cannot create timestamp, parsing error
While executing a SELECT command which uses the unix_timestamp method on certain columns fails with the below exception:
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.IllegalArgumentException: 
            Cannot create timestamp, parsing error 0700-02-29 23:57:02
The issue is caused by converting the timestamp in a different way during reading than it was converted when writing the data into the table.
  • Hive always stores values in UTC. To achieve this there is usually a conversion from session time zone to UTC when writing data, and from UTC to the same timezone when reading the data.
  • The conversion that is used when reading and writing the data should be the same otherwise problems like the one mentioned here can occur. Currently, there are two ways to do the conversions:
    • the legacy one which uses the old Date/Time classes,
    • the modern one which uses the newer JDK classes introduced in Java 8
As HIVE-25104 is included with SP1 in order to resolve this problem and read the data the same way they were written set the following property and rerun the failing query: set hive.parquet.timestamp.legacy.conversion.enabled=false;. For more information, see HIVE-26270. HIVE-26270 is a work in progress to allow autodetection of the conversion method to alleviate this issue.
CDPD-26975: Using the ABFS / S3A connectors in an Oozie workflow where the operations are "secured" may trigger an IllegalArgumentException with the error message java.net.URISyntaxException: Relative path in absolute URI.
Set the following XML configuration in the Datahub cluster's Cloudera Manager:
  1. In the Cloudera Manager Admin Console, go to the Oozie service.
  2. Click the Configuration tab.
  3. In the Oozie Server Advanced Configuration Snippet (Safety Valve) for oozie-site.xml set the following:
    • Set the following if you are using Amazon S3: <property> <name>oozie.service.HadoopAccessorService.fs.s3a</name> <value>fs.s3a.buffer.dir=/tmp/s3a</value> </property>
    • Set the following if you are using ABFS: <property> <name>oozie.service.HadoopAccessorService.fs.abfs</name> <value>fs.azure.buffer.dir=/tmp/abfs</value> </property> <property> <name>oozie.service.HadoopAccessorService.fs.abfss</name> <value>fs.azure.buffer.dir=/tmp/abfss</value> </property>
  4. Enter a Reason for change, and then click Save Change to commit the changes.
  5. Restart the Oozie service.
CDPD-29297: HWC + Oozie issue: Cannot create PoolableConnectionFactory. Currently, only Spark cluster mode is supported in the Oozie Spark Action with Hive Warehouse Connector (HWC).
Use Spark action in cluster mode. <spark xmlns="uri:oozie:spark-action:1.0">...<mode>cluster</mode>...</spark>
CDPD-29302: The Atlas lineage information is missing in case of HWC JDBC write.
None
OPSAPS-59928: INSERT INTO from SELECT using hive (hbase) table returns an error under certain conditions.
Users who upgraded to a Kerberized CDP cluster from HDP and enabled AutoTLS have reported this problem. For more information, see Cloudera Community article: ERROR: "FAILED: Execution Error, return code 2" when the user is unable to issue INSERT INTO from SELECT using hive (hbase) table.
In Cloudera Manager > TEZ > Configurations, find the tez.cluster.additional.classpath.prefix Safety Valve, and set the value to /etc/hbase/conf.
CDPD-21365: Performing a drop catalog operation drops the catalog from the CTLGS table. The DBS table has a foreign key reference on CTLGS for CTLG_NAME. Because of this, the DBS table is locked and creates a deadlock.
You must create an index in the DBS table on CTLG_NAME: CREATE INDEX CTLG_NAME_DBS ON DBS(CTLG_NAME);.
CDPD-26229: Hive TPCDS test query is timing out after 3 hours.
None.
CDPD-26556 After an upgrade, querying a CTAS table under certain conditions might throw an exception
If you upgrade your Hive cluster from CDH 6 to CDP 7, create a CTAS table in the CDP cluster from a table you upgraded from CDH, you might see the following exception when you query the new table:
class org.apache.hadoop.io.IntWritable cannot be cast to class org.apache.hadoop.hive.serde2.objectinspector.StandardUnionObjectInspector$StandardUnion        

This issue involves CDH-based tables having columns of complex types ARRAY, MAP, and STRUCT.

OPSAPS-54299 Installing Hive on Tez and HMS in the incorrect order causes HiveServer failure
You need to install Hive on Tez and HMS in the correct order; otherwise, HiveServer fails. You need to install additional HiveServer roles to Hive on Tez, not the Hive service; otherwise, HiveServer fails.
Follow instructions on Installing Hive on Tez.
CDPD-23506: OutOfMemoryError in LLAP
Long running spark-shell applications can leave sessions in interactive Hiveserver2 until the Spark application finishes (user exists from spark-shell), causing memory pressure in case of a high number of queries in the same shell (1000+).
You must close spark-shell so that sessions are closed. Add the owner of the database or the tables as a user with read or read/write access to the tables directly.
CDPD-23041: DROP TABLE on a table having an index does not work
If you migrate a Hive table to CDP having an index, DROP TABLE does not drop the table. Hive no longer supports indexes (HIVE-18448). A foreign key constraint on the indexed table prevents dropping the table. Attempting to drop such a table results in the following error:
java.sql.BatchUpdateException: Cannot delete or update a parent row: a foreign key constraint fails ("hive"."IDXS", CONSTRAINT "IDXS_FK1" FOREIGN KEY ("ORIG_TBL_ID") REFERENCES "TBLS ("TBL_ID"))
There are two workarounds:
  • Drop the foreign key "IDXS_FK1" on the "IDXS" table within the metastore. You can also manually drop indexes, but do not cascade any drops because the IDXS table includes references to "TBLS".
  • Launch an older version of Hive, such as Hive 2.3 that includes IDXS in the DDL, and then drop the indexes as described in Language Manual Indexing.
Apache Issue: Hive-24815
CDPD-20636 and DWX-6163: SHOW TABLES command does not produce a list of tables that are owned by the current user
When you run the SHOW TABLES command against a Hive Virtual Warehouse, tables are only returned if you have explicit read or read/write access to the table, or if you belong to a group that has read or read/write access. If you only have access to the tables because you are the owner of the objects, you can query the table content, but the table names do not appear in the SHOW TABLES command output.
Add the owner of the database or the tables as a user with read or read/write access to the tables directly.
CDPD-17766: Queries fail when using spark.sql.hive.hiveserver2.jdbc.url.principal in the JDBC URL to invoke Hive.
Do not specify spark.sql.hive.hiveserver2.jdbc.url.principal in the JDBC URL to invoke Hive remotely.
Workaround: specify principal=hive.server2.authentication.kerberos.principal as shown in the following syntax:
jdbc:hive://<host>:<port>/<dbName>;principal=hive.server2.authentication.kerberos.principal;<otherSessionConfs>?<hiveConfs>#<hiveVars>
HIVE-24271: Problem creating an ACID table in legacy table mode
In site-level, legacy CREATE TABLE mode, the CREATE MANAGED TABLE command might not work as expected to override the legacy behavior and create a managed ACID table. The command works only at the session level.
Workaround: Include table properties in a CREATE TABLE that specify a transactional table. For example:
CREATE TABLE T2(a int, b int) 
 STORED AS ORC
 TBLPROPERTIES ('transactional'='true');       
CDPD-13636: Hive job fails with OutOfMemory exception in the Azure DE cluster
Set the parameter hive.optimize.sort.dynamic.partition.threshold=0. Add this parameter in Cloudera Manager (Hive Service Advanced Configuration Snippet (Safety Valve) for hive-site.xml)
CDPD-16802: Autotranslate assertion failure.
The exception is not triggered when it is executed from Spark-Shell. This is from Hive in the getJdoFilterPushdownParam parameter of ExpressionTree.java, which checks the partition column as only String and not any other type.
This can be disabled by setting hive.metastore.integral.jdo.pushdown to true.
HiveServer Web UI displays incorrect data
If you enabled auto-TLS for TLS encryption, the HiveServer2 Web UI does not display the correct data in the following tables: Active Sessions, Open Queries, Last Max n Closed Queries
CDPD-11890: Hive on Tez cannot run certain queries on tables stored in encryption zones
This problem occurs when the Hadoop Key Management Server (KMS) connection is SSL-encrypted and a self signed certificate is used. SSLHandshakeException might appear in Hive logs.
Use one of the workarounds:
  • Install a self signed SSL certificate into cacerts file on all hosts.
  • Copy ssl-client.xml to a directory that is available in all hosts. In Cloudera Manager, in Clusters > Hive on Tez > Configuration. In Hive Service Advanced Configuration Snippet for hive-site.xml, click +, and add the name tez.aux.uris and valuepath-to-ssl-client.xml.

Technical Service Bulletins

TSB 2021-501: JOIN queries return wrong result for join keys with large size in Hive
JOIN queries return wrong results when performing joins on large size keys (larger than 255 bytes). This happens when the fast hash table join algorithm is enabled, which is enabled by default.
Impact
Incorrect resuts
Action required
  • Hotfix request
    Request a hotfix from Cloudera Support.
  • Workaround
    Set hive.vectorized.execution.mapjoin.native.fast.hashtable.enabled to false. This might cause a performance degradation depending on the type of query and the system it is running on.
Knowledge article
For the latest update on this issue, see the corresponding Knowledge article: TSB 2021-501: JOIN queries return wrong result for join keys with large size in Hive