What's New in Apache Kudu
This topic lists new features for Apache Kudu in this release of Cloudera Runtime.
Multiple tablet ids in 'local_replica delete'
The 'local_replica delete
' tool allows multiple tablet identifiers to be
specified and processed at once. This helps to reduce the overall latency caused by that
opening tablet server’s metadata takes significant time.
Adding --ignore_nonexistent for 'local_replica delete'
--ignore_nonexistent
flag was added for the 'local_replica
delete
' tool. This makes the real-world scripting scenarios easier if trying to
clean up tablet servers of particular tablet replicas.
KuduContect track operations per table
Adds the ability to track operation counts per table. Introduces the MapAccumulator to track these metrics in a single accumulator per operation type.
Support columnar row format in Java client
The setRowDataFormat()
method is added to KuduScanner
and
AsyncKuduScanner
. The Java client now supports the columnar RPC format.
The format can be set through the setRowDataFormat()
method on the
KuduScanner
.
Check range predicate first while evaluating Bloom filter predicate
Range predicates can be specified along with Bloom filter predicates for the same column. It is more effective to check against range predicates and exit early if the column value is out of bounds compared to computing hash and then looking up the value in Bloom filter.
Arenas for RPC request and response
RPC server side allocates a protobuf Arena for each request. The request RPC and response
are allocated from the Arena, ensuring that any sub-messages, strings, repeated fields, and
so on, use that Arena for allocation as well. Everything is deleted en-masse when the
InboundCall
object (which owns the Arena) is destroyed.
New metadata to avoid master when using scan tokens
A new metadata is added to the scan token to allow it to contain all of the metadata
required to construct a KuduTable and open a scanner in the clients. This means the
GetTableSchema
and GetTableLocations
RPC calls to the
master are no longer required when using the scan token.
New TableMetadataPB
, TabletMetadataPB
, and authorization
token fields were added as optional fields on the token. Additionally a
`projected_column_idx
` field was added that can be used in place of the
`projected_columns
`. This significantly reduces the size of the scan
token by not duplicating the ColumnSchemaPB
that is already in the
TableMetadataPB
.
Adding the table metadata to the scan token is enabled by default. However,it can be
disabled in rare cases where more resiliency to column renaming is desired.It can be
dsiabley in the kudu-spark integration using the kudu.useDriverMetada
property.
RaftConsensus::DumpStatusHtml() does not block Raft consensus activity
kudu::consensus::RaftConsensus::CheckLeadershipAndBindTerm()
needs to take
the lock to check the term and the Raft role. When many RPCs come in for the same tablet,
the contention can hog service threads and cause queue overflows on busy systems. With this
improvement, RaftConsensus::DumpStatusHtml()
no longer blocks Raft
consensus activity and is not blocked by it either.