Setting up the cost-based optimizer and statistics
You can use the cost-based optimizer (CBO) and statistics to develop efficient query execution plans that can improve performance. You must generate column statistics to make CBO functional.
In this task, you enable and configure the cost-based optimizer (CBO) and configure Hive to gather column statistics as well as table statistics for evaluating query performance. Column and table statistics are critical for estimating predicate selectivity and the cost of the plan. Certain advanced rewrites require column statistics.
In this task, you check, and set the following properties:
hive.stats.autogather
Controls collection of table-level statistics.
hive.stats.fetch.column.stats
Controls collection of column-level statistics.
hive.compute.query.using.stats
Instructs Hive to use statistics when generating query plans.
You can manually generate the table-level statistics for newly created tables and table partitions using the ANALYZE TABLE statement.
- The following components are running:
- HiveServer
- Hive Metastore
- Hive clients
- Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)