Set up the cost-based optimizer and statistics

You can use the cost-based optimizer (CBO) and statistics to develop efficient query execution plans that can improve performance. You must generate column statistics to make CBO functional.

In this task, you enable and configure the cost-based optimizer (CBO) and configure Hive to gather column statistics as well as table statistics for evaluating query performance. Column and table statistics are critical for estimating predicate selectivity and the cost of the plan. Certain advanced rewrites require column statistics.

In this task, you check, and set the following properties:

  • hive.stats.autogather

    Controls collection of table-level statistics.

  • hive.stats.fetch.column.stats

    Controls collection of column-level statistics.

  • hive.compute.query.using.stats

    Instructs Hive to use statistics when generating query plans.

You can manually generate the table-level statistics for newly created tables and table partitions using the ANALYZE TABLE statement.

  • The following components are running:
    • HiveServer
    • Hive Metastore
    • Hive clients
  • Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)
  1. In Cloudera Manager > Clusters select the Hive service, for example, HIVE_ON_TEZ-1.
  2. On the Configuration tab, search for hive.cbo.enable.
  3. Enable the hive.cbo.enable property.
  4. Search for and enable hive.compute.query.using.stats.
  5. In Cloudera Manager > Home > HIVE_ON_TEZ-1 select Restart from the options menu.