Statistics generation and viewing commands in Data Hub.

You can manually generate table and column statistics, and then view statistics using Hive queries. By default, Hive generates table statistics, but not column statistics, which you must generate manually to make cost-based optimization (CBO) functional.

Commands for generating statistics

The following ANALYZE TABLE command generates statistics for tables and columns:

ANALYZE TABLE [table_name] COMPUTE STATISTICS;
Gathers table statistics for non-partitioned tables.
ANALYZE TABLE [table_name] PARTITION(partition_column) COMPUTE STATISTICS;
Gathers table statistics for partitioned tables.
ANALYZE TABLE [table_name] COMPUTE STATISTICS for COLUMNS [comma_separated_column_list];
Gathers column statistics for the entire table.
ANALYZE TABLE partition2 (col1="x") COMPUTE STATISTICS for COLUMNS;
Gathers statistics for the partition2 column on a table partitioned on col1 with key x.

Commands for viewing statistics

You can use the following commands to view table and column statistics:

DESCRIBE [EXTENDED] table_name;
View table statistics. The EXTENDED keyword can be used only if the hive.stats.autogather property is enabled in the hive-site.xml configuration file. Use the Cloudera Manager Safety Valve feature (see link below).
DESCRIBE FORMATTED [db_name.]table_name [column_name] [PARTITION (partition_spec)];
View column statistics.