What's New in Apache Kudu

Learn about the new features of Kudu in Cloudera Runtime 7.1.5.

Kudu FIPS compliant cryptography

Kudu can now be configured to use FIPS compliant cryptography, through the use of FIPS 140-2 validated encryption modules, and with deployment on FIPS mode enabled RedHat Enterprise Linux (RHEL) and CentOS Operating Systems.

For more information, see Installing and Configuring CDP with FIPS.

Table Ownership

Added table ownership support. All newly created tables are automatically owned by the user creating them. It is also possible to change the owner by altering the table. You can also assign privileges to table owners using Apache Ranger.

Bloom filter column predicate pushdown

Bloom filter column predicate pushdown is added to allow optimized execution of filters which match on a set of column values with a false-positive rate. Support for Impala queries utilizing Bloom filter predicate is available yielding performance improvements of 19% to 30% in TPC-H benchmarks and around 41% improvement for distributed joins across large tables. Support for Spark is not yet available. .

Java client supports the columnar row format

The Java client now supports the columnar row format returned from the server transparently. Using this format can reduce the server CPU and size of the request over the network for scans. The columnar format can be enabled via the setRowDataFormat() method on the KuduScanner.

IGNORE operations

Java and Python support are added for the following operations:

  • INSERT_IGNORE: behaves like a normal INSERT except in the case when a duplicate row error would be raised by the primary key having been previously inserted.

  • UPDATE_IGNORE: behaves like a normal UPDATE except a key not found error will not be raised by the primary key having not been previously inserted.

  • DELETE_IGNORE: behaves like a normal DELETE except a key not found error will not be raised by the primary key having not been previously inserted. If a cluster supports IGNORE operations the KuduRestore job uses DELETE_IGNORE instead of DELETE.

A master server feature flag is also added to indicate that the cluster supports IGNORE operations. IGNORE operations are supported in Kudu Spark integration as well.

Unique cluster Id

This feature adds a unique cluster ID to the cluster. The ID is a UUID that is automatically generated and stored in the sys_catalog if missing (on fresh startup or upgrade). This cluster ID is exposed through the master web-ui and the kudu master list tool.

Optimizations and improvements

  • The Spark KuduContext accumulator metrics now track operation counts per table instead of cumulatively for all tables.

  • The kudu local_replica delete CLI tool now accepts multiple tablet identifiers. Along with the newly added --ignore_nonexistent flag, this helps with scripting scenarios when removing multiple tablet replicas from a particular Tablet Server.

  • Both Master’s and Tablet Server’s web UI now displays the name for a service thread pool group at the /threadz page

  • Introduced queue_overflow_rejections_ metrics for both Masters and Tablet Servers: number of RPC requests of a particular type dropped due to RPC service queue overflow.

  • Introduced a CoDel-like queue control mechanism for the apply queue. This helps to avoid accumulating too many write requests and timing them out in case of seek-bound workloads (e.g., uniform random inserts). The newly introduced queue control mechanism is disabled by default. To enable it, set the --tablet_apply_pool_overload_threshold_ms Tablet Server’s flag to appropriate value, for example 250.

  • Java client’s error collector can be resized.

  • Calls to the Kudu master server are now drastically reduced when using scan tokens. Previously deserializing a scan token would result in a GetTableSchema request and potentially a GetTableLocations request. Now the table schema and location information is serialized into the scan token itself avoiding the need for any requests to the master when processing them.

  • The default size of Master’s RPC queue is increased to 100 for the previous 50. This is to optimize for use cases where a Kudu cluster has many clients working concurrently.

  • Masters now have an option to cache table location responses. This is targeted for Kudu clusters which have many clients working concurrently. By default, the caching of table location responses is disabled. To enable table location caching, set the proper capacity of the table location cache using Master’s --table_locations_cache_capacity_mb flag. Setting it to 0 disables the caching. Up to 17% of improvement is observed in GetTableLocations request rate when enabling the caching.

  • Removed lock contention on Raft consensus lock in Tablet Servers while processing a write request. This helps to avoid RPC queue overflows when handling concurrent write requests to the same tablet from multiple clients.

  • Master’s performance for handling concurrent GetTableSchema requests has been improved. End-to-end tests indicated up to 15% improvement in sustained request rate for high concurrency scenarios.

  • Kudu servers now use protobuf Arena objects to perform all RPC request/response-related memory allocations. This gives a boost for overall RPC performance, and with further optimization the result request rate was increased significantly for certain methods. For example, the result request rate increased up to 25% for Master’s GetTabletLocations() RPC in case of highly concurrent scenarios.

  • Tablet Servers now use protobuf Arena for allocating Raft-related runtime structures. This results in substantial reduction of CPU cycles used and increases write throughput.

  • Tablet Servers now use protobuf Arena for allocating EncodedKeys to reduce allocator contention and improve memory locality.

  • Bloom filter predicate evaluation for scans can be computationally expensive. A heuristic has been added that verifies rejection rate of the supplied Bloom filter predicate below which the Bloom filter predicate is automatically disabled. This helped reduce regression observed with Bloom filter predicate in TPC-H benchmark query #9.

  • Improved scan performance of dictionary and plain-encoded string columns by avoiding copying them.

  • Improved maintenance manager’s heuristics to prioritize larger memstores.

  • Spark client’s KuduReadOptions now supports setting a snapshot timestamp for repeatable reads with READ_AT_SNAPSHOT consistency mode.

  • Spark 3 is supported.

  • Parallelize download blocks in tablet-copy-client.

  • Stop blocking op registration on MM mutex. This optimization buffers calls to RegisterOp() into a separate op map protected by a separate spinlock, and periodically merging the separate map into ‘opt_’.