Apache Kudu Usage Limitations
Schema Design Limitations
- Primary Key
-
-
The primary key cannot be changed after the table is created. You must drop and recreate a table to select a new primary key.
-
The columns which make up the primary key must be listed first in the schema.
-
The primary key of a row cannot be modified using the UPDATE functionality. To modify a row’s primary key, the row must be deleted and re-inserted with the modified key. Such a modification is non-atomic.
-
Columns with DOUBLE, FLOAT, or BOOL types are not allowed as part of a primary key definition. Additionally, all columns that are part of a primary key definition must be NOT NULL.
-
Auto-generated primary keys are not supported.
-
Cells making up a composite primary key are limited to a total of 16KB after internal composite-key encoding is done by Kudu.
-
- Cells
-
No individual cell may be larger than 64KB before encoding or compression. The cells making up a composite key are limited to a total of 16KB after the internal composite-key encoding done by Kudu. Inserting rows not conforming to these limitations will result in errors being returned to the client.
- Columns
-
-
By default, Kudu tables can have a maximum of 300 columns. We recommend schema designs that use fewer columns for best performance.
-
CHAR, VARCHAR, DATE, and complex types such as ARRAY, MAP, and STRUCT are not supported.
-
Type and nullability of existing columns cannot be changed by altering the table.
-
Dropping a column does not immediately reclaim space. Compaction must run first.
-
The precision and scale of DECIMAL columns cannot be changed by altering the table.
-
- Tables
-
-
Tables must have an odd number of replicas, with a maximum of 7.
-
Replication factor (set at table creation time) cannot be changed.
-
There is no way to run compaction manually, but dropping a table will reclaim the space immediately.
-
- Other Usage Limitations
-
-
Secondary indexes are not supported.
-
Multi-row transactions are not supported.
-
Relational features, such as foreign keys, are not supported.
-
Identifiers such as column and table names are restricted to be valid UTF-8 strings. Additionally, a maximum length of 256 characters is enforced.
-
If you are using Apache Impala to query Kudu tables, refer to the section on Impala Integration Limitations as well.
Partitioning Limitations
-
Tables must be manually pre-split into tablets using simple or compound primary keys. Automatic splitting is not yet possible. Kudu does not allow you to change how a table is partitioned after creation, with the exception of adding or dropping range partitions.
-
Data in existing tables cannot currently be automatically repartitioned. As a workaround, create a new table with the new partitioning and insert the contents of the old table.
-
Tablets that lose a majority of replicas (such as 1 left out of 3) require manual intervention to be repaired.
Scaling Recommendations and Limitations
Kudu can seamlessly run across a wide array of environments and workloads with minimal expertise and configuration at the following scale:
-
Recommended maximum number of masters is 3.
-
Recommended maximum number of tablet servers is 100.
-
Recommended maximum amount of stored data, post-replication and post-compression, per tablet server is 8 TiB.
-
Recommended number of tablets per tablet server is 1000 (post-replication) with 2000 being the maximum number of tablets allowed per tablet server.
- Maximum number of tablets per table is 60, per tablet server, at table-creation time.
-
Maximum number of tablets per table for each tablet server is 60, post-replication (assuming the default replication factor of 3), at table-creation time.
-
Recommended maximum amount of data per tablet is 50 GiB. Going beyond this can cause issues such a reduced performance, compaction issues, and slow tablet startup times.
The recommended target size for tablets is under 10 GiB.
- Number of master servers: 3
- More than 300 tablet servers
- 10+ TiB of stored data per tablet server, post-replication and post-compression
- More than 4000 tablets per tablet server, post-replication
- 50 GiB of stored data per tablet. Going beyond this can cause issues such a reduced performance, compaction issues, and slower tablet startup time.
Server Management Limitations
-
Production deployments should configure a least 4 GiB of memory for tablet servers, and ideally more than 16 GiB when approaching the data and tablet scale limits.
-
Write ahead logs (WALs) can only be stored on one disk.
-
Data directories cannot be removed. You must reformat the data directories to remove them.
-
Tablet servers cannot be gracefully decommissioned.
-
Tablet servers cannot change their address or port.
-
Kudu has a hard requirement on having an up-to-date NTP. Kudu masters and tablet servers will crash when out of sync.
-
Kudu releases have only been tested with NTP. Other time synchronization providers such as Chrony may not work.
Cluster Management Limitations
-
Rolling restart is not supported.
-
Recommended maximum point-to-point latency within a Kudu cluster is 20 milliseconds.
-
Recommended minimum point-to-point bandwidth within a Kudu cluster is 10 Gbps.
-
If you intend to use the location awareness feature to place tablet servers in different locations, it is recommended that you measure the bandwidth and latency between servers to ensure they fit within the above guidelines.
-
All masters must be started at the same time when the cluster is started for the very first time.
Replication and Backup Limitations
-
Kudu does not currently include any built-in features for backup and restore. Users are encouraged to use tools such as Spark or Impala to export or import tables as necessary.
Impala Integration Limitations
-
When creating a Kudu table, the CREATE TABLE statement must include the primary key columns before other columns, in primary key order.
-
Impala cannot update values in primary key columns.
-
Impala cannot create Kudu tables with VARCHAR or nested-typed columns.
-
Kudu tables with a name containing upper case or non-ASCII characters must be assigned an alternate name when used as an external table in Impala.
-
Kudu tables with a column name containing upper case or non-ASCII characters cannot be used as an external table in Impala. Columns can be renamed in Kudu to work around this issue.
-
!= and LIKE predicates are not pushed to Kudu, and instead will be evaluated by the Impala scan node. This may decrease performance relative to other types of predicates.
-
Updates, inserts, and deletes using Impala are non-transactional. If a query fails part of the way through, its partial effects will not be rolled back.
-
The maximum parallelism of a single query is limited to the number of tablets in a table. For good analytic performance, aim for 10 or more tablets per host or use large tables.
Spark Integration Limitations
-
Spark 2.2 (and higher) requires Java 8 at runtime even though Kudu Spark 2.x integration is Java 7 compatible. Spark 2.2 is the default dependency version as of Kudu 1.5.0.
-
Kudu tables with a name containing upper case or non-ASCII characters must be assigned an alternate name when registered as a temporary table.
-
Kudu tables with a column name containing upper case or non-ASCII characters must not be used with SparkSQL. Columns can be renamed in Kudu to work around this issue.
-
<> and ORpredicates are not pushed to Kudu, and instead will be evaluated by the Spark task. Only LIKE predicates with a suffix wildcard are pushed to Kudu. This means LIKE "FOO%" will be pushed, but LIKE "FOO%BAR" won't.
-
Kudu does not support all the types supported by Spark SQL. For example, Date and complex types are not supported.
-
Kudu tables can only be registered as temporary tables in SparkSQL.
-
Kudu tables cannot be queried using HiveContext.
Security Limitations
-
Data encryption at rest is not directly built into Kudu. Encryption of Kudu data at rest can be achieved through the use of local block device encryption software such as dmcrypt.
-
Row-level authorization is not available.
-
Kudu does not support configuring a custom service principal for Kudu processes. The principal must follow the pattern kudu/<HOST>@<DEFAULT.REALM>.
- Server certificates generated by Kudu IPKI are incompatible with bouncycastle version 1.52 and earlier.
- The highest supported version of the TLS protocol is TLSv1.2
Other Known Issues
The following are known bugs and issues with the current release of Kudu. They will be addressed in later releases. Note that this list is not exhaustive, and is meant to communicate only the most important known issues.
- If the Kudu master is configured with the -log_force_fsync_all option, the tablet servers and the clients will experience frequent timeouts, and the cluster may become unusable.
- If a tablet server has a very large number of tablets, it may take several minutes to start up. It is recommended to limit the number of tablets per server to 1000 or fewer. Consider this limitation when pre-splitting your tables. If you notice slow start-up times, you can monitor the number of tablets per server in the web UI.