Apache Druid introduction
HDP 3.x includes Apache Druid (incubating). Druid is an open-source, column-oriented data store
for online analytical processing (OLAP) queries on event data. Druid is optimized for time-series
data analysis and supports the following data analytics features:
- Real-time streaming data ingestion
- Automatic data summarization
- Scalability to trillions of events and petabytes of data
- Sub-second query latency
- Approximate algorithms, such as hyperLogLog and theta
Druid is designed for enterprise-scale business intelligence (BI) applications in environments that require minimal latency and high availability. Applications running interactive queries can "slice and dice" data in motion.
You can use Druid as a data store to return BI about streaming data from user activity on a website or multidevice entertainment platform, from consumer events sent over by a data aggregator, or from a large set of transactions or Internet events.
HDP includes Druid 0.12.1, which is licensed under the Apache License, version 2.0.