Writing managed tables through HWC

A step-by-step procedure walks you through connecting to HiveServer (HS2) to write tables from Spark. You launch the Spark session, and write ACID, managed tables to Apache Hive.

  • Configure JDBC execution mode.
  • Configure Kerberos for HWC.
  1. Start the Apache Spark session and include the URL for HiveServer.
    spark-shell --jars /opt/cloudera/parcels/CDH/jars/hive-warehouse-connector-assembly-1.0.0.7.2.1.0-327.jar \
    -- conf spark.sql.hive.hiveserver2.jdbc.url=<JDBC endpoint for HiveServer>
    ...                
  2. Include in the launch string a configuration of the intermediate location to use as a staging directory.
    Example syntax:
    ...
    --conf spark.sql.hive.hwc.execution.mode=spark \
    --conf spark.datasource.hive.warehouse.read.via.llap=false \
    --conf spark.datasource.hive.warehouse.load.staging.dir=<path to directory>
  3. You can begin writing Hive managed tables.
    For example, in Java/Scala:
    import com.hortonworks.hwc.HiveWarehouseSession
    import com.hortonworks.hwc.HiveWarehouseSession._
    val hive = HiveWarehouseSession.session(spark).build()
    hive.setDatabase("tpcds_bin_partitioned_orc_1000")
    WAREHOUSE_CONNECTOR).option("table", <tableName>).save()
    hive.setDatabase("testDatabase")
    hive.createTable("newTable")
    .ifNotExists()
    .column("ws_sold_time_sk", "bigint")
    .column("ws_ship_date_sk", "bigint")
    .create()
    sql("SELECT ws_sold_time_sk, ws_ship_date_sk FROM web_sales WHERE ws_sold_time_sk > 80000)
    .write.format(HIVE_WAREHOUSE_CONNECTOR)
    .mode("append")
    .option("table", "newTable")
    .save()