DDL Bucketed Tables
To enhance query performance, create bucketed tables for the purpose of dividing data into smaller, more manageable chunks based on specified columns using the CLUSTER BY clause.
Syntax
Example 1: Bucketed table based on user_id
In the first example, the data is bucketed solely based on the user_id column.
CREATE TABLE tbl(
user_id BIGINT,
firstname STRING,
lastname STRING
)
CLUSTERED BY(user_id) INTO 256 BUCKETS;
Example 2: Bucketed table based on user_id and sorted by firstname
In the second example, in addition to bucketing by user_id
, the data is also
sorted within each bucket based on the firstname
column. The bucket partitioning
algorithm employs a hash function to evenly distribute rows across the table, aiding in
optimizing query performance.
CREATE TABLE tbl2 (
user_id BIGINT,
firstname STRING,
lastname STRING
)
CLUSTERED BY (user_id) SORT BY (firstname) INTO 256 BUCKETS;
Limitations
CREATE
bucketed table statement is currently not supported on Kudu and Iceberg tables.- Alter operations such as
add/drop/change/replace
columns on bucketed tables are also not supported. - Impala currently does not leverage the bucket info to optimize queries.