Integrating Apache Hive with Apache Spark and BIPDF version

Submit a Python app

A step-by-step procedure shows you how submit a Python app based on the HiveWarehouseConnector library by submitting an application, and then adding a Python package.

  1. Choose an execution mode, for example the HWC JDBC execution mode, for your application and check that you meet the configuration requirements, described earlier.
  2. Configure a Spark-HiveServer connection, described earlier or, in your app submission include the appropriate --conf in step 4.
  3. Locate the hive-warehouse-connector-assembly jar in the /hive_warehouse_connector/ directory.
    For example, find hive-warehouse-connector-assembly-<version>.jar in the following location:
    /opt/cloudera/parcels/CDH/jars  
  4. Add the connector jar and configurations to the app submission using the --jars option.
    Example syntax:
    pyspark --jars <path to jars>/hive_warehouse_connector/hive-warehouse-connector-assembly-<version>.jar \
    --conf <configuration properties>
  5. Locate the pyspark_hwc zip package in the /hive_warehouse_connector/ directory.
  6. Add the Python package for the connector to the app submission.
    Example syntax:
    --py-files <path>/hive_warehouse_connector/pyspark_hwc-<version>.zip
    Example submission in JDBC execution mode:
    pyspark --jars /opt/cloudera/parcels/CDH/jars/hive-warehouse-connector-assembly-<version>.jar \
    --conf spark.sql.hive.hwc.execution.mode=spark \
    --conf spark.datasource.hive.warehouse.read.via.llap=false \
    --conf spark.datasource.hive.warehouse.load.staging.dir=<path to directory> \
    --py-files /opt/cloudera/parcels/CDH/lib/hive_warehouse_connector/pyspark_hwc-<version>.zip