Known issues in Hive Virtual Warehouses

Known issues identified in 1.5.2

DWX-16989: Hive query running on Iceberg table fails randomly

Suppose you have disabled the auto-suspend option for a Hive Virtual Warehouse or if the Virtual Warehouse is under continuous load and hence it cannot be stopped by the auto-suspend option. In this situation, using the Iceberg table format may cause the following exceptions to appear in the query coordinator log along with the submitted queries that have failed:

org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (token for hive: HDFS_DELEGATION_TOKEN owner=hive/dwx-env-host-1.cdp.local@EXAMPLE.CLOUDERA.COM, renewer=hive, realUser=, issueDate=1709813340891, maxDate=1710418140891, sequenceNumber=19784486, masterKeyId=52) is expired, current time: 2024-03-08 04:09:32,835-0800 expected renewal time: 2024-03-08 04:09:00,891-0800

org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (token for hive: HDFS_DELEGATION_TOKEN owner=hive/dwx-env-host-1.cdp.local@EXAMPLE.CLOUDERA.COM, renewer=hive, realUser=, issueDate=1699855596578, maxDate=1700460396578, sequenceNumber=16863242, masterKeyId=39) can't be found in cache

org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (OzoneToken owner=hive/dwx-env-ewxf6g-env.cdp.local@ROOT.EXAMPLE.SITE, renewer=hive, realUser=, issueDate=2024-03-19T21:49:31.033Z, maxDate=2024-03-19T21:50:31.033Z, sequenceNumber=72, masterKeyId=1, strToSign=null, signature=null, awsAccessKeyId=null, omServiceId=ozone1710521984, omCertSerialId=11) is expired, current time: 2024-03-19 21:51:34,293+0000 expected renewal time: 2024-03-19 21:51:31,033+0000

org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (OzoneToken owner=hive/dwx-env-azt1gg-env.cdp.local@ROOT.EXAMPLE.SITE, renewer=hive, realUser=, issueDate=2024-04-09T16:04:12.889Z, maxDate=2024-04-09T17:04:12.889Z, sequenceNumber=29, masterKeyId=1, strToSign=null, signature=null, awsAccessKeyId=null, omServiceId=ozone1711550158, omCertSerialId=2597525731772327) can't be found in cache

This happens because the HDFS delegation tokens are not renewed when using the Iceberg table format. After the existing HDFS delegation tokens expire, Hive query coordinator (TEZ App Master) cannot access the tables on the file system during the query planning phase. The problem is independent of the file system--Ozone FS or Hadoop FS. The error only occurs after the HDFS delegation tokens have expired. By default, the delegation tokens expire in one day. However, you can modify the expiration time on the CDP Base cluster.

The problem does not occur if the query coordinator pods in the Hive Virtual Warehouse are stopped manually or by using the auto-suspend functionality within the token expiration period.

Apply this workaround only if you cannot suspend the Hive Virtual Warehouse.

Log in to the Data Warehouse service as DWAdmin.
Go to the Virtual Warehouses tab and click > Edit > Configurations > Query Coordinator.
Select env from the Configuration files drop-down menu.
Add the following value against the JVM_OPTS property:
```
-Diceberg.scan.plan-in-worker-pool=false
```
Click Apply Changes.

Known issues identified in 1.5.1

DWX-16891: Hive-Ranger integration issue after a refresh: You may see the following error after you refresh a Hive Virtual Warehouse: Error while compiling statement: FAILED: HiveAccessControlException Permission denied. This happens because Hive fails to evaluate Ranger group policies when the Virtual Warehouse is updated--either by upgrading or refreshing it.; Rebuild the Hive Virtual Warehouse to fix the Ranger integration issues.
DWX-15480: Hive queries fail with FILE_NOT_FOUND error: ACID directory cache may become outdated in Tez AMs in case of ACID tables that change often, possibly leading to different errors with the same root cause: "split generation works from cache pointing to non-existing files". And you may see the following error in the diagnostic bundles and query logs: FILE_NOT_FOUND: Unable to get file status.; Disable the cache by setting the value of the hive.txn.acid.dir.cache.duration property to -1 by going to Virtual Warehouse > > Edit > CONFIGURATIONS > Hue > Configuration files > hive-site from the CDW web interface.
DWX-15287: Drop database query for Hive fails with Invalid ACL Exception: You may see the following error in a Hue or beeline session when running DROP DATABASE, DROP TABLE, or ALTER TABLE DROP PARTITION operations on a Hive Virtual Warehouse that is in Stopped state: "org.apache.zookeeper.KeeperException$InvalidACLException: KeeperErrorCode = InvalidACL for /llap-sasl/user-hive".
The exception is caused because the Hive Virtual Warehouse tries to evict the cache in the LLAP executors, but the compute pods of the stopped warehouse are no longer running.
note
The database or table is deleted despite the exception, only the LLAP executors do not flush their database or table related buffers, because these executors are not running.; Start the Virtual Warehouse before you run the DROP DATABASE, DROP TABLE, or ALTER TABLE DROP PARTITION operations.
Alternatively, you can add the hive.llap.io.proactive.eviction.enabled=false setting in the hive-site.xml file. This method may result in some performance degradation, because LLAP no longer discards the dropped database/table or temp table related buffers.

Log in to CDW as DWAdmin.

Click > Edit > CONFIGURATIONS > Hiveserver2 on the Virtual Warehouse tile and select hive-site from the Configuration files drop-down menu.

Click and add the following line:
hive.llap.io.proactive.eviction.enabled=false

Click Apply Changes.
Wait for the Virtual Warehouse to refresh and return to Running or Stopped state.

Known issues identified before 1.4.1

DWX-4842: Entities are not being created in Atlas: Base clusters that are using Java 11 might be using truststores in PKCS12 format. Currently, Hive Virtual Warehouses on CDW Private Cloud only supports truststores in JKS format. This prevents the entities from being created in Atlas.; Using the keytool, convert the PKCS12 truststore in the base cluster to a JKS truststore.