Known Issues in Apache Hive

Learn about the known issues in Hive, the impact or changes to the functionality, and the workaround.

CDPD-28809: error:java.lang.OutOfMemoryError: Java heap space while uploading a file to ABFS Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57) at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) at org.apache.hadoop.io.ElasticByteBufferPool.getBuffer(ElasticByteBufferPool.java:96) at org.apache.hadoop.fs.azurebfs.services.AbfsOutputStream.writeCurrentBufferToService(AbfsOutputStream.java:414) at org.apache.hadoop.fs.azurebfs.services.AbfsOutputStream.writeCurrentBufferToService(AbfsOutputStream.java:394) at org.apache.hadoop.fs.azurebfs.services.AbfsOutputStream.write(AbfsOutputStream.java:210) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:62)
How upload works in ABFS: Max number of upload requests which can be queued is 2*4*num_available_processors. Every request will hold byte buffer of size `fs.azure.write.request.size` whose default is 8 MB. Thus if num_avalaible processers is 8, total memory utilised will 2*4*8*8 = 512 MB which can cause processes to go OOM.
  1. Configure fs.azure.write.request.size to a preferably lower value for example 1 MB.
  2. Reduce the number of max request which can be queued by configuring the parameter fs.azure.write.max.requests.to.queue
  3. Allocate more memory to the process by setting high Xmx values.
CDPD-28809 For GCS: Error: error:java.lang.OutOfMemoryError: Java heap space while uploading a file to Google Cloud storage at java.lang.OutOfMemoryError.<init>(OutOfMemoryError.java:48) at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.media.MediaHttpUploader.buildContentChunk(MediaHttpUploader.java:609) at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:408) Local Variable: com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.http.GenericUrl#49 Local Variable: com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.http.HttpResponse#24 at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336) Local Variable: com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.http.GenericUrl#48 Local Variable: com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.media.MediaHttpUploader#24 at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:551) at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:475) at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:592) Local Variable: com.google.cloud.hadoop.repackaged.gcs.com.google.api.services.storage.Storage$Objects$Insert#24 at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:318)
How upload works in ABFS: Every GCS output stream will create a byte buffer chunk size 64 MB, which can be configured via fs.gs.outputstream.upload.chunk.size So if there are large number of streams getting created by a process, it will eat up lot of memory thus causing OOM issue. For example if there are 31 threads, total memory utilised will be 31 * 64 MB = 1984 MB.
  1. Configure fs.gs.outputstream.upload.chunk.size to a lower value preferably 1MB - 4 MB.
  2. Allocate more memory to the process by setting high Xmx values.
CDPD-15518: ACID tables you write using the Hive Warehouse Connector cannot be read from an Impala virtual warehouse.
Read the tables from a Hive virtual warehouse or using Impala queries in Data Hub.
CDPD-13636: Hive job fails with OutOfMemory exception in the Azure DE cluster
Set the parameter hive.optimize.sort.dynamic.partition.threshold=0. Add this parameter in Cloudera Manager (Hive Service Advanced Configuration Snippet (Safety Valve) for hive-site.xml)
ENGESC-2214: Hiveserver2 and HMS service logs are not deleted
Update Hive log4j configurations. Hive -> Configuration -> HiveServer2 Logging Advanced Configuration Snippet (Safety Valve) Hive Metastore -> Configuration -> Hive Metastore Server Logging Advanced Configuration Snippet (Safety Valve) Add the following to the configurations: appender.DRFA.strategy.action.type=DELETE appender.DRFA.strategy.action.basepath=${log.dir} appender.DRFA.strategy.action.maxdepth=1 appender.DRFA.strategy.action.PathConditions.glob=${log.file}.* appender.DRFA.strategy.action.PathConditions.type=IfFileName appender.DRFA.strategy.action.PathConditions.nestedConditions.type=IfAccumulatedFileCount appender.DRFA.strategy.action.PathConditions.nestedConditions.exceeds=same value as appender.DRFA.strategy.max
HiveServer Web UI displays incorrect data
If you enabled auto-TLS for TLS encryption, the HiveServer2 Web UI does not display the correct data in the following tables: Active Sessions, Open Queries, Last Max n Closed Queries
CDPD-11890: Hive on Tez cannot run certain queries on tables stored in encryption zones
This problem occurs when the Hadoop Key Management Server (KMS) connection is SSL-encrypted and a self signed certificate is used. SSLHandshakeException might appear in Hive logs.
Use one of the workarounds:
  • Install a self signed SSL certificate into cacerts file on all hosts.
  • Copy ssl-client.xml to a directory that is available in all hosts. In Cloudera Manager, in Clusters > Hive on Tez > Configuration. In Hive Service Advanced Configuration Snippet for hive-site.xml, click +, and add the name tez.aux.uris and valuepath-to-ssl-client.xml.