You can use statistics to optimize queries for improved performance. The cost-based
optimizer (CBO) also uses statistics to compare query plans and choose the best one. By
viewing statistics instead of running a query, you can sometimes get answers to your data
questions faster.
This task shows how to generate different types of statistics about a table.
-
Launch a hive shell and log in.
-
Gather statistics for the non-partitioned table mytable:
ANALYZE TABLE mytable COMPUTE STATISTICS;
-
Confirm that the
hive.stats.autogather
property is
enabled.
-
In Ambari, select .
-
In Filter, enter
hive.stats.autogather
.
-
View table statistics you generated:
DESCRIBE EXTENDED mytable;
-
Gather column statistics for the table:
ANALYZE TABLE mytable COMPUTE STATISTICS FOR COLUMNS;
-
View column statistics for the name column in my_table in the my_db
database:
DESCRIBE FORMATTED my_db.my_table name;