What's New in Apache Kudu
This topic lists new features for Apache Kudu in this release of Cloudera Runtime.
Fine-grained authorization using Ranger
Kudu now supports native fine-grained authorization via integration with Apache Ranger (in addition to integration with Apache Sentry). Kudu may now enforce access control policies defined for Kudu tables and columns stored in Ranger.
Proxy support using Knox
Kudu’s web UI now supports proxying via Apache Knox. Kudu can be deployed in a firewalled state behind a Knox Gateway which will forward HTTP requests and responses between clients and the Kudu web UI.
Support for HTTP keep-alive
Kudu’s web UI now supports HTTP keep-alive. Operations that access multiple URLs will now reuse a single HTTP connection, improving their performance.
Rolling-restart without stopping on-going Kudu workloads
The kudu tserver quiesce
tool is added to quiesce tablet servers. While a
tablet server is quiescing, it will stop hosting tablet leaders and stop serving new scan
requests. This can be used to orchestrate a rolling restart without stopping on-going Kudu
workloads.
Auto time source support for HybridClock timestamps
Introduced auto
time source for HybridClock timestamps. With
--time_source=auto
in AWS and GCE cloud environments, Kudu masters and tablet
servers use the built-in NTP client synchronized with dedicated NTP servers available via
host-only networks. With --time_source=auto
in environments other than AWS/GCE,
Kudu masters and tablet servers rely on local machine's clock synchronized by NTP. The default
setting for the HybridClock time source (--time_source=system
) is
backward-compatible, requiring the local machine's clock to be synchronized by the kernel's NTP
discipline.
Ability to move replicas away from a tablet server
The kudu cluster rebalance
tool now supports moving replicas away from
specific tablet servers by supplying the --ignored_tservers
and
--move_replicas_from_ignored_tservers
arguments.
Ability to specify table creation options using JSON
The kudu table create
tool is added to allow users to specify table creation
options using JSON.
Ability to automatically rebalance tablet replicas among tablet servers
An experimental feature is added to Kudu that allows it to automatically rebalance tablet
replicas among tablet servers. The background task can be enabled by setting the
--auto_rebalancing_enabled
flag on the Kudu masters. Before starting
auto-rebalancing on an existing cluster, the CLI rebalancer tool should be run first.
Support for DATE and VARCHAR data types
Kudu now supports DATE and VARCHAR data types.
Optimizations and improvements
- The Write Ahead Log file segments and index chunks are now managed by Kudu’s file cache. With that, all the long-lived file descriptors used by Kudu are managed by the file cache, and there’s no longer a need for capacity planning file descriptor usage.
- Kudu no longer requires the running of
kudu fs update_dirs
to change a directory configuration or recover from a disk failure - Kudu tablet servers and masters now expose a tablet-level metric
num_raft_leaders
for the number of tablet replicas hosted on the server - Kudu's maintenance operation scheduling has been updated to prioritize reducing WAL retention under memory pressure. Kudu would previously prioritize operations that yielded high-memory reduction, which could result in high WAL disk usage in workloads that contained updates
- A new maintenance operation is introduced to remove rowsets that have had all of their rows deleted and whose newest deletes operations are considered ancient
- The built-in NTP client is now fully supported as the time source for Kudu's HybridTime
clock. It is no longer marked as experimental. To switch the time source from the existing
system
time source (which is the default) to the built-in NTP client, use--time_source=builtin
- Introduced additional metrics for the built-in NTP client
- Updated
/config
page of masters' and tablet servers' WebUI to display configured and effective time source.In addition, the effective list of reference servers for the built-in NTP client is shown there as well, if applicable.
- The processing of Raft consensus vote requests has been improved to be more robust during high contention scenarios like election storms.
- Added a validator to enforce consistency between the maximum size of an RPC and the
maximum size of tablet transaction memory, controlled by
--rpc_max_message_size
and--tablet_transaction_memory
flags correspondingly.In prior releases, if the limit on the size of RPC requests is increased and the limit on tablet transaction memory size is kept with the default setting, certain Raft transactions could be committed but not applied.
- The metrics endpoint now supports filtering metrics by a metric severity level.
- Many
kudu local_replica
tools are updated to not open the block manager, which significantly reduces the amount of IO done when running them - The Kudu Java client now exposes a way to get the resource metrics associated with a given scanner
- Scan predicates are pushed down to RLE decoders, improving predicate-evaluation-efficiency in some workloads
- The log block manager will now attempt to use multiple threads to open blocks in each data directory, in some tests reducing startup time by up to 20%
- The
raft_term
andtime_since_last_leader_heartbeat
aggregated table metrics will now return the maximum metric reported instead of the sum - Kudu's tablet server web UI scans page is updated to show the number of round trips per scanner
- Kudu's master and tablet server web UIs are updated to show critical partition information, including tablet count and on-disk size
- Kudu servers now expose the
last_read_elapsed_seconds
andlast_write_elapsed_seconds
tablet-level metrics that indicate how long ago the most recent read and write operations to a given tablet were - Kudu servers now expose the
transaction_memory_limit_rejections
tablet-level metric that tracks the number of transactions rejected because a given tablet's transactional memory limit was reached