Using Druid and Apache Hive
Also available as:
PDF

Druid and Hive tuning

As administrator, you can set druid.hive properties to improve Druid-Hive performance.

Performance related druid.hive properties

If Hive and Druid are installed with Ambari, the properties are set and tuned for your cluster automatically. However, you can fine-tune some properties if you detect performance problems with applications that are running the queries. The following list includes some of the Druid properties that can be used by Hive. As an HDP administrator, you can troubleshoot and customize a Hive-Druid integration using these properties.

Property Description
hive.druid.indexer.segments.granularity Granularity of the segments created by the Druid storage handler.
hive.druid.indexer.partition.size.max Maximum number of records per segment partition.
hive.druid.indexer.memory.rownum.max Maximum number of records in memory while storing data in Druid.
hive.druid.broker.address.default Address of the Druid broker node. When Hive queries Druid, this address must be declared.
hive.druid.coordinator.address.default Address of the Druid coordinator node. It is used to check the load status of newly created segments.
hive.druid.select.threshold When a SELECT query is split, this is the maximum number of rows that Druid attempts to retrieve.
hive.druid.http.numConnection Number of connections used by the HTTP client.
hive.druid.http.read.timeout Read timeout period for the HTTP client in ISO8601 format. For example, P2W, P3M, PT1H30M, PT0.750S are possible values.
hive.druid.sleep.time Sleep time between retries in ISO8601 format.
hive.druid.basePersistDirectory Local temporary directory used to persist intermediate indexing state.
hive.druid.storage.storageDirectory Deep storage location of Druid.
hive.druid.metadata.base Default prefix for metadata table names.
hive.druid.metadata.db.type Metadata database type. The only valid values are "mysql" and "postgresql"
hive.druid.metadata.uri URI to connect to the database.
hive.druid.working.directory Default HDFS working directory used to store some intermediate metadata.
hive.druid.maxTries Maximum number of retries to connect to Druid before throwing an exception.
hive.druid.bitmap.type Encoding algorithm use to encode the bitmaps.

If you installed both Hive and Druid with Ambari, then do not change any of the hive.druid.* properties other than those above when there are performance issues.