Set up the cost-based optimizer and statistics
You can use the cost-based optimizer (CBO) and statistics to generate efficient query execution plans that can improve performance. You must generate column statistics to make CBO functional.
In this task, you enable and configure the cost-based optimizer (CBO) and configure Hive to gather column and table statistics for evaluating query performance. Column and table statistics are critical for estimating predicate selectivity and cost of the plan. Certain advanced rewrites require column statistics.
In this task, you check, and set the following properties in the hive-site.xml configuration file:
hive.stats.autogather
Controls collection of table-level statistics.
hive.stats.fetch.column.stats
Controls collection of column-level statistics.
hive.compute.query.using.stats
Instructs Hive to use statistics when generating query plans.
All of these properties are checked by default. You can manually generate the table-level statistics for newly created tables and table partitions using the ANALYZE TABLE statement.
- You installed Ambari.
- You added the Apache Hive service and started all components.
- You have administrative privileges to configure Hive in Ambari.