Apache Kudu Usage Limitations

Schema Design Limitations

Primary Key

The primary key cannot be changed after the table is created. You must drop and recreate a table to select a new primary key.
The columns which make up the primary key must be listed first in the schema.
The primary key of a row cannot be modified using the UPDATE functionality. To modify a row’s primary key, the row must be deleted and re-inserted with the modified key. Such a modification is non-atomic.
Columns with DOUBLE, FLOAT, or BOOL types are not allowed as part of a primary key definition. Additionally, all columns that are part of a primary key definition must be NOT NULL.
Auto-generated primary keys are not supported.
Cells making up a composite primary key are limited to a total of 16KB after internal composite-key encoding is done by Kudu.

Cells

No individual cell may be larger than 64KB before encoding or compression. The cells making up a composite key are limited to a total of 16KB after the internal composite-key encoding done by Kudu. Inserting rows not conforming to these limitations will result in errors being returned to the client.

Columns

By default, Kudu will not permit the creation of tables with more than 300 columns. We recommend schema designs that use fewer columns for best performance.
DECIMAL, CHAR, VARCHAR, DATE, and complex types such as ARRAY are not supported.
Type and nullability of existing columns cannot be changed by altering the table.
Dropping a column does not immediately reclaim space. Compaction must run first.

Tables

Tables must have an odd number of replicas, with a maximum of 7.
Replication factor (set at table creation time) cannot be changed.
There is no way to run compaction manually, but dropping a table will reclaim the space immediately.

Other Usage Limitations

Secondary indexes are not supported.
Multi-row transactions are not supported.
Relational features, such as foreign keys, are not supported.
Identifiers such as column and table names are restricted to be valid UTF-8 strings. Additionally, a maximum length of 256 characters is enforced.

If you are using Apache Impala to query Kudu tables, refer to the section on Impala Integration Limitations as well.

Partitioning Limitations

Tables must be manually pre-split into tablets using simple or compound primary keys. Automatic splitting is not yet possible. Kudu does not allow you to change how a table is partitioned after creation, with the exception of adding or dropping range partitions.
Data in existing tables cannot currently be automatically repartitioned. As a workaround, create a new table with the new partitioning and insert the contents of the old table.
Tablets that lose a majority of replicas (such as 1 left out of 3) require manual intervention to be repaired.

Scaling Recommendations and Limitations

Recommended maximum number of tablet servers is 100.
Recommended maximum number of masters is 3.
Recommended maximum amount of stored data, post-replication and post-compression, per tablet server is 8 TiB.
Recommended number of tablets per tablet server is 1000 (post-replication) with 2000 being the maximum number of tablets allowed per tablet server.
Maximum number of tablets per table for each tablet server is 60, post-replication (assuming the default replication factor of 3), at table-creation time.
Recommended maximum amount of data per tablet is 50 GiB. Going beyond this can cause issues such a reduced performance, compaction issues, and slow tablet startup times.

The recommended target size for tablets is under 10 GiB

Server Management Limitations

Production deployments should configure a least 4 GiB of memory for tablet servers, and ideally more than 16 GiB when approaching the data and tablet scale limits.
Write ahead logs (WALs) can only be stored on one disk.
Disk failures are not tolerated and tablets servers will crash as soon as one is detected.
Failed disks with unrecoverable data requires formatting of all Kudu data for that tablet server before it can be started again.
Data directories cannot be added/removed; they must be reformatted to change the set of directories.
Tablet servers cannot be gracefully decommissioned.
Tablet servers cannot change their address or port.
Kudu has a hard requirement on having an up-to-date NTP. Kudu masters and tablet servers will crash when out of sync.
Kudu releases have only been tested with NTP. Other time synchronization providers such as Chrony may not work.

Cluster Management Limitations

Rack awareness is not supported.
Multi-datacenter is not supported.
Rolling restart is not supported.
All masters must be started at the same time when the cluster is started for the very first time.

Replication and Backup Limitations

Kudu does not currently include any built-in features for backup and restore. Users are encouraged to use tools such as Spark or Impala to export or import tables as necessary.

Impala Integration Limitations

When creating a Kudu table, the CREATE TABLE statement must include the primary key columns before other columns, in primary key order.
Impala cannot update values in primary key columns.
Impala cannot create Kudu tables with DECIMAL, VARCHAR, or nested-typed columns.
Kudu tables with a name containing upper case or non-ASCII characters must be assigned an alternate name when used as an external table in Impala.
Kudu tables with a column name containing upper case or non-ASCII characters cannot be used as an external table in Impala. Columns can be renamed in Kudu to work around this issue.
!= and LIKE predicates are not pushed to Kudu, and instead will be evaluated by the Impala scan node. This may decrease performance relative to other types of predicates.
Updates, inserts, and deletes using Impala are non-transactional. If a query fails part of the way through, its partial effects will not be rolled back.
The maximum parallelism of a single query is limited to the number of tablets in a table. For good analytic performance, aim for 10 or more tablets per host or use large tables.

Impala Keywords Not Supported for Creating Kudu Tables

PARTITIONED
LOCATION
ROWFORMAT

Spark Integration Limitations

Spark 2.2 (and higher) requires Java 8 at runtime even though Kudu Spark 2.x integration is Java 7 compatible. Spark 2.2 is the default dependency version as of Kudu 1.5.0.
Kudu tables with a name containing upper case or non-ASCII characters must be assigned an alternate name when registered as a temporary table.
Kudu tables with a column name containing upper case or non-ASCII characters must not be used with SparkSQL. Columns can be renamed in Kudu to work around this issue.
<> and ORpredicates are not pushed to Kudu, and instead will be evaluated by the Spark task. Only LIKE predicates with a suffix wildcard are pushed to Kudu. This means LIKE "FOO%" will be pushed, but LIKE "FOO%BAR" won't.
Kudu does not support all the types supported by Spark SQL. For example, Date, Decimal, and complex types are not supported on Kudu.
Kudu tables can only be registered as temporary tables in SparkSQL.
Kudu tables cannot be queried using HiveContext.

Security Limitations

Data encryption at rest is not directly built into Kudu. Encryption of Kudu data at rest can be achieved through the use of local block device encryption software such as dmcrypt.
Authorization is only available at a system-wide, coarse-grained level. Table-level, column-level, and row-level authorization features are not available.
Kudu does not support configuring a custom service principal for Kudu processes. The principal must follow the pattern kudu/<HOST>@<DEFAULT.REALM>.
Kudu integration with Apache Flume does not support writing to Kudu clusters that require authentication.

Categories: Architecture | Kudu | All Categories

Installation and Upgrade

Configuration