Setting statistics manually
For tables and partitions where the statistics do not change or are already known, statistics can be set manually.
By: Manish Maheshwari, Data Architect and Data Scientist at Cloudera, Inc.
Manually set statistics as follows:
-
Set the total number of rows in a table:
ALTER TABLE <table_name> SET TBLPROPERTIES('numRows'='new_value', 'STATS_GENERATED_VIA_STATS_TASK'='true’);
-
Set the total number of rows for a specific partition:
ALTER TABLE <table_name> PARTITION (keycol1=value_1,keycol2=value_2...) SET TBLPROPERTIES('numRows'='new_value', 'STATS_GENERATED_VIA_STATS_TASK'='true’);
-
Set column statistics:
ALTER TABLE <table_name> SET COLUMN STATS <col_name> (‘numDVs'=‘100‘);
Compute the
numDVs
values by running the following query:SELECT NDV(col_name) FROM <table_name>;