Troubleshooting Hive on Spark
- Problem: Delayed result from the first query after starting a new Hive on Spark session
- The first query after starting a new Hive on Spark session might be delayed due to the start-up time for the Spark on YARN cluster. The query waits for YARN containers to initialize. Subsequent queries will be faster.
- Problem: Exception Error: org.apache.thrift.transport.TTransportException (state=08S01,code=0) and HiveServer2 is down
- HiveServer2 memory is set too small. For more information, see STDOUT for HiveServer2. To fix this issue:
- In Cloudera Manager, go to HIVE.
- Click Configuration.
- Search for Java Heap Size of HiveServer2 in Bytes, and change it to be a larger value. Cloudera recommends a minimum value of 256 MB.
- Restart HiveServer2.
- Problem: Out-of-memory error
- You might get an out-of-memory error similar to the following:
15/03/19 03:43:17 WARN channel.DefaultChannelPipeline: An exception was thrown by a user handler while handling an exception event ([id: 0x9e79a9b1, /10.20.118.103:45603 => /10.20.120.116:39110] EXCEPTION: java.lang.OutOfMemoryError: Java heap space) java.lang.OutOfMemoryError: Java heap space
This error indicates that the Spark driver does not have enough off-heap memory. Increase the off-heap memory by setting spark.yarn.driver.memoryOverhead or spark.driver.memory.
- Problem: Hive on Spark does not work with HBase
- Hive on Spark with HBase is not supported. If you use HBase, use Hive on MapReduce instead of Hive on Spark.
- Problem: Spark applications stay alive forever and occupy cluster resources
- This can occur if there are multiple concurrent Hive sessions. To manually terminate the Spark applications:
- Find the YARN application IDs for the applications by going to Cloudera Manager and clicking .
- Log in to the YARN ResourceManager host.
- Open a terminal and run:
yarn application -kill <applicationID>
applicationID is each YARN application ID you found in step 1.