Using Druid and Apache Hive
Also available as:

Transform Apache Hive Data to Druid

You can execute a Hive query to transform data in Hive to a data source in Druid.

  • If you use Kerberos, configure and run Hive low-latency analytical processing (LLAP).
  • You set up a table, ssb_10_flat_orc as defined in the Star Schema Benchmark example.

A Hive SQL command, an excerpt from the Star Schema Benchmark using the Hive/Druid Integration, invokes the Druid storage handler, specifies Druid segment granularity, and maps selected Hive columns to Druid column types.

  1. Put all the Hive data to undergo ETL in a Hive table.
  2. Execute a Hive query to set the location of the Druid broker using a DNS name or IP address and port 8082, the default broker text listening port.
    If you installed the Hive and Druid services using Ambari, you can skip this step.
  3. Run a CREATE TABLE AS SELECT statement to create a new Druid datasource from the Hive table you selected in step 1.
    The following example pushes Hive data to Druid.
    CREATE TABLE ssb_druid_hive
    STORED BY 'org.apache.hadoop.hive.
    "druid.segment.granularity" = "MONTH",
    "druid.query.granularity" = "DAY")
    cast(d_year || '-' || d_monthnuminyear || '-' || d_daynuminmonth as timestamp) as `__time`,
    cast(c_city as string) c_city,
    cast(c_nation as string) c_nation,
    cast(c_region as string) c_region,
    cast(d_weeknuminyear as string) d_weeknuminyear,
    cast(d_year as string) d_year,
    cast(d_yearmonth as string) d_yearmonth,
    cast(d_yearmonthnum as string) d_yearmonthnum,
    cast(lo_discount as string) lo_discount,
    cast(lo_quantity as string) lo_quantity,
    cast(p_brand1 as string) p_brand1,
    cast(p_category as string) p_category,
    cast(p_mfgr as string) p_mfgr,
    cast(s_city as string) s_city,
    cast(s_nation as string) s_nation,
    cast(s_region as string) s_region,
    lo_extendedprice * lo_discount discounted_price,
    lo_revenue - lo_supplycost net_revenue
    ssb_10_flat_orc.customer, ssb_10_flat_orc.dates, ssb_10_flat_orc.lineorder,
    ssb_10_flat_orc.part, ssb_10_flat_orc.supplier
    lo_orderdate = d_datekey and lo_partkey = p_partkey
    and lo_suppkey = s_suppkey and lo_custkey = c_custkey;