Range-specific hash schemas example: Using impala-shell

Review examples of using impala-shell to create and alter Kudu tables with range-specific hash schemas.

The following is a few examples of creating a new table with custom hash schemas for some of its range partitions.

The table-wide hash schema for table t1 has two hash dimensions (hash bucketing by the ‘id’ and the ‘name’ columns independently):

CREATE TABLE t1 (id INT, name STRING, PRIMARY KEY(id, name))
PARTITION BY HASH(id) PARTITIONS 3 HASH(name) PARTITIONS 4
RANGE (id)
(
    // hash partition by “id” and “name” as in the table-wide hash schema,
    // but customize the number of hash buckets in each dimension 
    PARTITION 15 <= VALUES < 20 HASH(id) PARTITIONS 5 HASH(name) PARTITIONS 3,
    // this range has the table-wide hash schema
    PARTITION 20 <= VALUES < 25
)

The table-wide hash schema for table t2 has one hash dimension (hash bucketing by the ‘name’ column):

CREATE TABLE t2 (id INT, name STRING, PRIMARY KEY(id, name))
PARTITION BY HASH(name) PARTITIONS 3
RANGE (id)
(
    // hash partition by “name” but use custom number of hash buckets
    PARTITION  5 <= VALUES < 10 HASH(name) PARTITIONS 5,

    // hash partition by “id”
    PARTITION 10 <= VALUES < 15 HASH(id) PARTITIONS 2,

    // hash partition by “id” and “name”
    PARTITION 15 <= VALUES < 20 HASH(id, name) PARTITIONS 8,

    // this range has the same schema as the range [15, 20) above:
    // hash partition by the primary key (i.e. by (id, name) pair) into 8 buckets
    PARTITION 20 <= VALUES < 25 PARTITIONS 8,

    // using the table-wide hash schema if no override is specified
    PARTITION 25 <= VALUES < 30
)

The following are examples of adding new range partitions with custom hash schemas to a table that already exists:

// t2: add new range partition by “id”, hashed by “id” and “name” together
ALTER TABLE t2 ADD RANGE PARTITION -5 <= VALUES < 0 HASH PARTITIONS 10;

// t1: add new range partition by “id”, additionally hashed by “id” and “name” separately
// (defaults to the table-wide hash schema since no range-specific hash schema specified)
ALTER TABLE t1 ADD RANGE PARTITION 25 <= VALUES < 30

// t2: add new range partition by “id”, additionally hashed by “name”
ALTER TABLE t2 ADD RANGE PARTITION 30 <= VALUES < 40 HASH(name) PARTITIONS 5;

// t1: add new range partition by “id”, additionally hashed by “id” and “name” separately
ALTER TABLE t1 ADD RANGE PARTITION 40 <= VALUES < 50 HASH(id) PARTITIONS 10, HASH(name) PARTITIONS 5;