What's New in Apache Kudu

This topic lists new features for Apache Kudu in this release of Cloudera Runtime.

Multiple tablet ids in 'local_replica delete'🔗

The 'local_replica delete' tool allows multiple tablet identifiers to be specified and processed at once. This helps to reduce the overall latency caused by that opening tablet server’s metadata takes significant time.

Adding --ignore_nonexistent for 'local_replica delete'🔗

--ignore_nonexistent flag was added for the 'local_replica delete' tool. This makes the real-world scripting scenarios easier if trying to clean up tablet servers of particular tablet replicas.

KuduContect track operations per table🔗

Adds the ability to track operation counts per table. Introduces the MapAccumulator to track these metrics in a single accumulator per operation type.

Support columnar row format in Java client🔗

The setRowDataFormat() method is added to KuduScanner and AsyncKuduScanner. The Java client now supports the columnar RPC format. The format can be set through the setRowDataFormat() method on the KuduScanner.

Check range predicate first while evaluating Bloom filter predicate🔗

Range predicates can be specified along with Bloom filter predicates for the same column. It is more effective to check against range predicates and exit early if the column value is out of bounds compared to computing hash and then looking up the value in Bloom filter.

Arenas for RPC request and response🔗

RPC server side allocates a protobuf Arena for each request. The request RPC and response are allocated from the Arena, ensuring that any sub-messages, strings, repeated fields, and so on, use that Arena for allocation as well. Everything is deleted en-masse when the InboundCall object (which owns the Arena) is destroyed.

New metadata to avoid master when using scan tokens🔗

A new metadata is added to the scan token to allow it to contain all of the metadata required to construct a KuduTable and open a scanner in the clients. This means the GetTableSchema and GetTableLocations RPC calls to the master are no longer required when using the scan token.

New TableMetadataPB, TabletMetadataPB, and authorization token fields were added as optional fields on the token. Additionally a `projected_column_idx` field was added that can be used in place of the `projected_columns`. This significantly reduces the size of the scan token by not duplicating the ColumnSchemaPB that is already in the TableMetadataPB.

Adding the table metadata to the scan token is enabled by default. However,it can be disabled in rare cases where more resiliency to column renaming is desired.It can be dsiabley in the kudu-spark integration using the kudu.useDriverMetada property.

RaftConsensus::DumpStatusHtml() does not block Raft consensus activity 🔗

kudu::consensus::RaftConsensus::CheckLeadershipAndBindTerm() needs to take the lock to check the term and the Raft role. When many RPCs come in for the same tablet, the contention can hog service threads and cause queue overflows on busy systems. With this improvement, RaftConsensus::DumpStatusHtml() no longer blocks Raft consensus activity and is not blocked by it either.