Automating mode selection

You need to know the prerequisites for using Auto Translate to select an execution mode transparently, based on your query. In a single step, you configure Auto Translate and submit an application.

You configure the spark.sql.extensions property to enable auto translation. When you enable Auto Translate, Spark implicitly selects HWC, or native Apache Spark to run your query. Spark selects HWC when you query an Apache Hive managed (ACID) table and falls back to native Spark for reading external tables. You can use the same Spark APIs to access either managed or external tables.

Configure Spark Direct Reader mode and JDBC execution mode.
Configure Kerberos.

Submit the Spark application, including spark.sql.extensions property to enable Auto Translate.

If you use the kyro serializer, include

-- conf
                        spark.sql.extensions=com.qubole.spark.hiveacid.HiveAcidAutoConvertExtension

For example:

sudo -u hive spark-shell --jars /opt/cloudera/parcels/CDH/jars/hive-warehouse-connector-assembly-<version>.jar \
--conf "spark.sql.extensions=com.qubole.spark.hiveacid.HiveAcidAutoConvertExtension" \
--conf spark.kryo.registrator="com.qubole.spark.hiveacid.util.HiveAcidKyroRegistrator"

Read employee data in table emp_acid.

View employee data in table emp_acid.

scala> spark.sql("select * from emp_acid").show(1000, false) 

+------+----------+--------------------+-------------+--------------+-----+-----+-------+
|emp_id|first_name|              e_mail|date_of_birth|          city|state|  zip|dept_id|
+------+----------+--------------------+-------------+--------------+-----+-----+-------+
|677509|      Lois|lois.walker@hotma…  |    3/29/1981|        Denver|   CO|80224|      4|
|940761|    Brenda|brenda.robinson@g...|    7/31/1970|     Stonewall|   LA|71078|      5|
|428945|       Joe|joe.robinson@gmai…  |    6/16/1963|  Michigantown|   IN|46057|      3|
……….
……….
……….

You do not need to specify an execution mode. You simply submit the query. Using the HWC API, to use hive.execute to execute a read. This command processes queries through HWC in either JDBC and Spark Direct Reader modes.

Automating mode selection

We want your opinion

How can we improve this page?