Writing managed tables through HWC

A step-by-step procedure walks you through connecting to HiveServer (HS2) to write tables from Spark, which is recommended for production. You launch the Spark session, and write ACID, managed tables to Apache Hive.

  • Configure JDBC execution mode.
  • Configure Kerberos for HWC.
  1. Start the Apache Spark session and include the URL for HiveServer.
    spark-shell --jars /opt/cloudera/parcels/CDH/jars/hive-warehouse-connector-assembly-1.0.0.7.2.1.0-327.jar \
    -- conf spark.sql.hive.hiveserver2.jdbc.url=<JDBC endpoint for HiveServer>
    ...                
  2. Include in the launch string a configuration of the intermediate location to use as a staging directory.
    Example syntax:
    ...
    --conf spark.sql.hive.hwc.execution.mode=spark \
    --conf spark.datasource.hive.warehouse.read.via.llap=false \
    --conf spark.datasource.hive.warehouse.load.staging.dir=<path to directory>
  3. You can begin writing Hive managed tables.
    For example, in Java/Scala:
    import com.hortonworks.hwc.HiveWarehouseSession
    import com.hortonworks.hwc.HiveWarehouseSession._
    val hive = HiveWarehouseSession.session(spark).build()
    hive.setDatabase("tpcds_bin_partitioned_orc_1000")
    WAREHOUSE_CONNECTOR).option("table", <tableName>).save()
    hive.setDatabase("testDatabase")
    hive.createTable("newTable")
    .ifNotExists()
    .column("ws_sold_time_sk", "bigint")
    .column("ws_ship_date_sk", "bigint")
    .create()
    sql("SELECT ws_sold_time_sk, ws_ship_date_sk FROM web_sales WHERE ws_sold_time_sk > 80000)
    .write.format(HIVE_WAREHOUSE_CONNECTOR)
    .mode("append")
    .option("table", "newTable")
    .save()