Range-specific hash schemas example: Using impala-shell
Review examples of using impala-shell to create and alter Kudu tables with range-specific hash schemas.
The following is a few examples of creating a new table with custom hash schemas for some of its range partitions.
The table-wide hash schema for table t1 has two hash dimensions (hash bucketing by the ‘id’ and the ‘name’ columns independently):
CREATE TABLE t1 (id INT, name STRING, PRIMARY KEY(id, name)) PARTITION BY HASH(id) PARTITIONS 3 HASH(name) PARTITIONS 4 RANGE (id) ( // hash partition by “id” and “name” as in the table-wide hash schema, // but customize the number of hash buckets in each dimension PARTITION 15 <= VALUES < 20 HASH(id) PARTITIONS 5 HASH(name) PARTITIONS 3, // this range has the table-wide hash schema PARTITION 20 <= VALUES < 25 )
The table-wide hash schema for table t2 has one hash dimension (hash bucketing by the ‘name’ column):
CREATE TABLE t2 (id INT, name STRING, PRIMARY KEY(id, name)) PARTITION BY HASH(name) PARTITIONS 3 RANGE (id) ( // hash partition by “name” but use custom number of hash buckets PARTITION 5 <= VALUES < 10 HASH(name) PARTITIONS 5, // hash partition by “id” PARTITION 10 <= VALUES < 15 HASH(id) PARTITIONS 2, // hash partition by “id” and “name” PARTITION 15 <= VALUES < 20 HASH(id, name) PARTITIONS 8, // this range has the same schema as the range [15, 20) above: // hash partition by the primary key (i.e. by (id, name) pair) into 8 buckets PARTITION 20 <= VALUES < 25 PARTITIONS 8, // using the table-wide hash schema if no override is specified PARTITION 25 <= VALUES < 30 )
The following are examples of adding new range partitions with custom hash schemas to a table that already exists:
// t2: add new range partition by “id”, hashed by “id” and “name” together ALTER TABLE t2 ADD RANGE PARTITION -5 <= VALUES < 0 HASH PARTITIONS 10; // t1: add new range partition by “id”, additionally hashed by “id” and “name” separately // (defaults to the table-wide hash schema since no range-specific hash schema specified) ALTER TABLE t1 ADD RANGE PARTITION 25 <= VALUES < 30 // t2: add new range partition by “id”, additionally hashed by “name” ALTER TABLE t2 ADD RANGE PARTITION 30 <= VALUES < 40 HASH(name) PARTITIONS 5; // t1: add new range partition by “id”, additionally hashed by “id” and “name” separately ALTER TABLE t1 ADD RANGE PARTITION 40 <= VALUES < 50 HASH(id) PARTITIONS 10, HASH(name) PARTITIONS 5;