You need to know the prerequisites for using Auto Translate to select an execution
mode transparently, based on your query. In a single step, you configure Auto Translate and
submit an application.
You configure the spark.sql.extensions
property to enable auto
translation. When you enable Auto Translate, Spark implicitly selects HWC, or native
Apache Spark to run your query. Spark selects HWC when you query an Apache Hive managed
(ACID) table and falls back to native Spark for reading external tables. You can use the
same Spark APIs to access either managed or external tables.
- Configure Spark Direct Reader mode and JDBC execution mode.
- Configure Kerberos.
-
Submit the Spark application, including
spark.sql.extensions
property to enable Auto Translate.
-
If you use the kyro serializer, include
-- conf
spark.sql.extensions=com.qubole.spark.hiveacid.HiveAcidAutoConvertExtension
For
example:
sudo -u hive spark-shell --jars /opt/cloudera/parcels/CDH/jars/hive-warehouse-connector-assembly-<version>.jar \
--conf "spark.sql.extensions=com.qubole.spark.hiveacid.HiveAcidAutoConvertExtension" \
--conf spark.kryo.registrator="com.qubole.spark.hiveacid.util.HiveAcidKyroRegistrator"
-
Read employee data in table
emp_acid
.
View employee data in table
emp_acid
.
scala> spark.sql("select * from emp_acid").show(1000, false)
+------+----------+--------------------+-------------+--------------+-----+-----+-------+
|emp_id|first_name| e_mail|date_of_birth| city|state| zip|dept_id|
+------+----------+--------------------+-------------+--------------+-----+-----+-------+
|677509| Lois|lois.walker@hotma… | 3/29/1981| Denver| CO|80224| 4|
|940761| Brenda|brenda.robinson@g...| 7/31/1970| Stonewall| LA|71078| 5|
|428945| Joe|joe.robinson@gmai… | 6/16/1963| Michigantown| IN|46057| 3|
……….
……….
……….
You do not need to specify an execution mode. You simply submit the query.
Using the HWC API, to use hive.execute
to execute a read. This
command processes queries through HWC in either JDBC and Spark Direct Reader
modes.