You can execute a Hive query to transform data in Hive to a data source in Druid.
- If you use Kerberos, configure and run Hive low-latency analytical processing (LLAP).
- You set up a table, ssb_10_flat_orc as defined in the Star Schema Benchmark example.
A Hive SQL command, an excerpt from the Star Schema Benchmark using the Hive/Druid
Integration, invokes the Druid storage handler, specifies Druid segment granularity,
and maps selected Hive columns to Druid column types.
-
Put all the Hive data to undergo ETL in a Hive table.
-
Execute a Hive query to set the location of the Druid broker using a DNS name or IP address and port 8082, the default broker text listening port.
SET hive.druid.broker.address.default=10.10.20.30:8082;
If you installed the Hive and Druid services using Ambari, you can skip this step.
-
Run a CREATE TABLE AS SELECT statement to create a new Druid datasource from
the Hive table you selected in step 1.
The following example pushes Hive data to
Druid.
CREATE TABLE ssb_druid_hive
STORED BY 'org.apache.hadoop.hive.
druid.DruidStorageHandler'
TBLPROPERTIES (
"druid.segment.granularity" = "MONTH",
"druid.query.granularity" = "DAY")
AS
SELECT
cast(d_year || '-' || d_monthnuminyear || '-' || d_daynuminmonth as timestamp) as `__time`,
cast(c_city as string) c_city,
cast(c_nation as string) c_nation,
cast(c_region as string) c_region,
cast(d_weeknuminyear as string) d_weeknuminyear,
cast(d_year as string) d_year,
cast(d_yearmonth as string) d_yearmonth,
cast(d_yearmonthnum as string) d_yearmonthnum,
cast(lo_discount as string) lo_discount,
cast(lo_quantity as string) lo_quantity,
cast(p_brand1 as string) p_brand1,
cast(p_category as string) p_category,
cast(p_mfgr as string) p_mfgr,
cast(s_city as string) s_city,
cast(s_nation as string) s_nation,
cast(s_region as string) s_region,
lo_revenue,
lo_extendedprice * lo_discount discounted_price,
lo_revenue - lo_supplycost net_revenue
FROM
ssb_10_flat_orc.customer, ssb_10_flat_orc.dates, ssb_10_flat_orc.lineorder,
ssb_10_flat_orc.part, ssb_10_flat_orc.supplier
where
lo_orderdate = d_datekey and lo_partkey = p_partkey
and lo_suppkey = s_suppkey and lo_custkey = c_custkey;