Partition transform feature

You can use one or more partition transforms to partition your data. Each transform is applied to a single column. Identity-transform means no transformation; the column values are used for partitioning. The other transforms apply a function to the column values and the data is partitioned by the transformed values.

Using CREATE TABLE ... PARTITIONED BY you create identity-partitioned Iceberg tables. Identity-partitioned Iceberg tables are similar to the Impala partitioned tables and are stored in the same directory structure as the Impala partitioned tables.

Impala supports Iceberg advanced partitioning through the PARTITION BY SPEC clause. Using this clause, you can define the Iceberg partition fields and partition transforms.

The following table lists the available transformations of partitions and corresponding transform spec.
Transformation Spec Supported from Impala
Partition by year years(time_stamp) | year(time_stamp) yes
Partition by month months(time_stamp) | month(time_stamp) yes
Partition by a date value stored as int (dateint) days(time_stamp) | date(time_stamp) no
Partition by hours hours(time_stamp) no
Partition by a dateint in hours date_hour(time_stamp) no
Partition by hashed value mod N buckets bucket(N, col) yes
Partition by value truncated to L, which is a number of characters truncate(L, col) yes

Strings are truncated to length L. Integers and longs are truncated to bins. For example, truncate(10, i) yields partitions 0, 10, 20, 30 …

The idea behind transformation partition by hashed value mod N buckets is the same as hash bucketing for Hive tables. A hashing algorithm calculates the bucketed column value (modulus). For example, for 10 buckets, data is stored in column value % 10, ranging from 0-9 (0 to n-1) buckets.

You use the PARTITIONED BY SPEC clause to partition a table by an identity transform.

Impala syntax

CREATE TABLE [IF NOT EXISTS] [db_name.]table_name	  
  [(col_name data_type, ... )]
  [PARTITIONED BY SPEC([col_name][, spec(value)][, spec(value)]...)]
   STORED (AS | BY) ICEBERG
  [TBLPROPERTIES (property_name=property_value, ...)]

Where spec(value) represents one or more of the following transforms:

  • YEARS(col_name)
  • MONTHS(col_name)
  • DAYS(col_name)
  • BUCKET(bucket_num,col_name)
  • TRUNCATE(length, col_name)

Impala examples

CREATE TABLE ice_13 (i INT, t TIMESTAMP, j BIGINT) PARTITIONED BY SPEC (i, HOUR(t), TRUNCATE(1000, j)) STORED BY ICEBERG;

The following examples show how to use the PARTITION BY SPEC clause in a CREATE TABLE query from Impala.

CREATE TABLE ice_t(id INT, name STRING, dept STRING)
PARTITIONED BY SPEC (bucket(19, id), dept)
STORED BY ICEBERG
TBLPROPERTIES ('format-version'='2');
CREATE TABLE ice_ctas
PARTITIONED BY SPEC (truncate(1000, id))
STORED BY ICEBERG
TBLPROPERTIES ('format-version'='2')
AS SELECT id, int_col, string_col FROM source_table;