Setting statistics manually

For tables and partitions where the statistics do not change or are already known, statistics can be set manually.

By: Manish Maheshwari, Data Architect and Data Scientist at Cloudera, Inc.

Manually set statistics as follows:

  • Set the total number of rows in a table:

    ALTER TABLE <table_name> SET TBLPROPERTIES('numRows'='new_value', 'STATS_GENERATED_VIA_STATS_TASK'='true’);
  • Set the total number of rows for a specific partition:

    ALTER TABLE <table_name> PARTITION (keycol1=value_1,keycol2=value_2...)
     SET TBLPROPERTIES('numRows'='new_value', 'STATS_GENERATED_VIA_STATS_TASK'='true’);
  • Set column statistics:

    ALTER TABLE <table_name> SET COLUMN STATS <col_name> (‘numDVs'=‘100‘);

    Compute the numDVs values by running the following query:

    SELECT NDV(col_name) FROM <table_name>;