You can use statistics to optimize queries for improved performance. The cost-based
optimizer (CBO) also uses statistics to compare query plans and choose the best one. By
viewing statistics instead of running a query, you can sometimes get answers to your data
questions faster.
This task shows how to generate different types of statistics about a table.
-
Launch a hive shell and log in.
-
Gather statistics for the non-partitioned students table:
ANALYZE TABLE mytable COMPUTE STATISTICS;
-
Enable
hive.stats.autogather
property required for the
DESCRIBE EXTENDED command.
-
In Ambari, select .
-
In Filter, enter
hive.stats.autogather
, and check the
checkbox.
-
View table statistics you generated:
DESCRIBE EXTENDED mytable;
-
View column statistics for the name column in my_table in the my_db
database:
DESCRIBE FORMATTED my_db.my_table name;