Generate and view Apache Hive statistics
You can use statistics to optimize queries for improved performance. The cost-based optimizer (CBO) also uses statistics to compare query plans and choose the best one. By viewing statistics instead of running a query, you can sometimes get answers to your data questions faster.
This task shows how to generate different types of statistics about a table.
- Launch a hive shell and log in.
Gather statistics for the non-partitioned table mytable:
ANALYZE TABLE mytable COMPUTE STATISTICS;
Confirm that the
hive.stats.autogatherproperty is enabled.
- In Ambari, select .
In Filter, enter
View table statistics you generated:
DESCRIBE EXTENDED mytable;
Gather column statistics for the table:
ANALYZE TABLE mytable COMPUTE STATISTICS FOR COLUMNS;
View column statistics for the name column in my_table in the my_db
DESCRIBE FORMATTED my_db.my_table name;