Using Druid and Apache Hive
Also available as:
PDF

Accelerating Hive queries using Druid

You can perform interactive analytic queries on real-time and historical data using the HDP integration of Hive and Apache Druid (incubating). You can discover existing Druid data sources as external tables, create or ingest batch data into Druid, set up Druid-Kafka streaming ingestion using Hive, and query Druid data sources from Hive.

The integration of Hive with Druid places a SQL layer on Druid. After Druid ingests data from a Hive enterprise data warehouse (EDW), the interactive and sub-second query capabilities of Druid can be used to accelerate queries on historical data from the EDW. Hive integration with Druid enables applications such as Tableau to scale while queries run concurrently on both real-time and historical data. The following figure is an overview of how Hive historical data can be brought into a Druid environment. Queries analyzing Hive-sourced data are run directly on the historical nodes of Druid after indexing between the two databases completes.