Kudu

Review the partitioning guidelines and limitations before deploying the Kudu service on your cluster.

Partitioning guidelines🔗

Kudu supports partitioning tables by RANGE and HASH partitions. The RANGE and HASH partitions can be combined to create effective partitioning strategies. It is also possible to utilize non-covering RANGE partitions.

For large tables, such as fact tables, aim for as many tablets as the cores in the cluster.

For small tables such as dimension tables, aim for a large number of tablets, and ensure each tablet is at least 1 GB.

Limitations🔗

Current versions of Kudu come with a number of usage limitations. Cloudera recommends that you review the usage limitations of Kudu with respect to schema design, scaling, server management, cluster management, replication and backup, security, and integration with Spark and Impala.

Currently, Kudu does not support multi-row transactions. Operations that affect multiple rows do not roll back if the operation fails half way through. This should be mitigated by exploiting the primary key uniqueness constraint to make operations idempotent.