Data Access
Also available as:
PDF
loading table of contents...

SQL Optimization

Cost-based optimization (CBO) of SQL queries is supported in Hive 0.13.0 and later. CBO uses Hive table, table partition, and column statistics to create efficient query execution plans. Efficient query plans better utilize cluster resources and improve query latency. CBO is most useful for complex queries that contain multiple JOIN statements and for queries on very large tables.

[Note]Note

Tables are not required to have partitions to generate CBO statistics. Column-level CBO statistics can be generated by both partitioned and unpartitioned tables.

CBO generates the following statistics:

Statistics GranularityDescription
Table-level

- Uncompressed size of table

- Number of rows

- Number of files

Column-level

- Number of distinct values

- Number of NULL values

- Minimum value

- Maximum value

CBO requires column-level statistics to generate the best query execution plans. Later, when viewing these statistics from the command line, you can choose to also include table-level statistics that are generated by the hive.stats.autogather configuration property. However, CBO does not use these table-level statistics to generate query execution plans.