Data Access
Also available as:
PDF
loading table of contents...

Statistics

[Tip]Tip

Gather both column and table statistics for best query performance.

Column and table statistics must be calculated for optimal Hive performance because they are critical for estimating predicate selectivity and cost of the plan. In the absence of table statistics, Hive CBO does not function. Certain advanced rewrites require column statistics.

Ensure that the configuration properties in the following table are set to true to improve the performance of queries that generate statistics. You can set the properties using Ambari or by customizing the hive-site.xml file.

Configuration ParameterSetting to Enable Statistics Description

hive.stats.fetch.column.stats

true

Instructs Hive to collect column-level statistics.

hive.compute.query.using.stats

true

Instructs Hive to use statistics when generating query plans.