Hive on Spark Changes in CDH 6.0

The following new features have been added to Hive on Spark in CDH 6.0:

Dynamic RDD Caching for Hive on Spark

An optimization has been added to Hive on Spark that enables automatic caching of reused RDDs (Resilient Distributed Datasets). This optimization can improve query performance when the query or sub-query must scan a table multiple times. For example, TPC-DS query 39 is a query that requires multiple table scans. This optimization is disabled by default in CDH 6.0, but can be enabled by setting the hive.combine.equivalent.work.optimization property to true in the hive-site.xml file.

To configure this property in Cloudera Manager:

  1. In the Admin Console, select the Hive service.
  2. Click the Configuration tab.
  3. Search for the HiveServer2 Advanced Configuration Snippet (Safety Valve) for hive-site.xml.
  4. Enter the following property configuration information:

    • Name: hive.combine.equivalent.work.optimization
    • Value: true
    • Description: Enables dynamic RDD caching for HoS

    To disable this configuration, set the Value field to false.

    To set this configuration property in the XML editor, enter the following code:

    <property>
         <name>hive.combine.equivalent.work.optimization</name>
         <value>true</value>
         <description>Disables dynamic RDD caching for HoS</description>
    </property>
                   
  5. Click Save Changes, and restart the service.

For more information see HIVE-10844 and HIVE-10550.

Optimized Hash Tables Enabled for Hive on Spark

Support has been added for optimized hash tables for Hive on Spark to reduce memory overhead. This feature is enabled by default in CDH 6.0, but can be disabled by setting the hive.mapjoin.optimized.hashtable property to false in the hive-site.xml file. To configure this property in Cloudera Manager:

  1. In the Admin Console, select the Hive service.
  2. Click the Configuration tab.
  3. Search for the HiveServer2 Advanced Configuration Snippet (Safety Valve) for hive-site.xml.
  4. Enter the following property configuration information:

    • Name: hive.mapjoin.optimized.hashtable
    • Value: false
    • Description: Disables optimized hash tables for HoS

    To enable this configuration, set the Value field to true.

    To set this configuration property in the XML editor, enter the following code:

    <property>
         <name>hive.mapjoin.optimized.hashtable</name>
         <value>false</value>
         <description>Disables optimized hash tables for HoS</description>
    </property>
                   
  5. Click Save Changes, and restart the service.

For more details, see HIVE-11182 and HIVE-6430.