What's New in Apache Kudu

Learn about the new features of Kudu in Cloudera Runtime 7.1.9.

Kudu JWT support and proxy support

JWT authentication is an alternative to Kerberos authentication, and you can use it in situations where Kerberos authentication is not a viable option but authentication is required nevertheless. For more details, see JWT authentication for Kudu.

It is now possible to separate the internal and the external traffic in a Kudu cluster while providing the connectivity for Kudu clients running in external networks where the internal traffic is never routed through a proxy's or a loadbalancer's endpoint. Essentially, it allows for the internal traffic (for example, the traffic between tablet servers and masters) to bypass advertised RPC addresses, using alternative addresses for inter-cluster communications. For more details, see Proxied RPCs in Kudu.

Auto-incrementing column

Introduced auto-incrementing column. These columns are populated on the server side with a monotonically increasing counter. The counter is local to every tablet; for example, each tablet has a separate auto incrementing counter.

Kudu now supports experimental non-unique primary key. When a table with non-unique primary key is created, an auto-incrementing column named auto_incrementing_id will be added automatically to the table as the key column. The non-unique key columns and the auto-incrementing column together form the effective primary key (see, KUDU-1945). For more details, see Non-unique primary key index.

Auto-leader rebalancing

An experimental feature is added to Kudu that allows it to automatically rebalance tablet leader replicas among tablet servers. The background task can be enabled by setting the --auto_leader_rebalancing_enabled flag on the Kudu masters (see, KUDU-3390).

Immutable column

Introduced immutable column. It is useful to define such a column which represents a semantically constant entity (see, KUDU-3353).

Added sanity check to detect wall clock jumps

Added a sanity check to detect strange jumps in wall clock readings. The idea is to rely on the readings from the CLOCK_MONOTONIC_RAW clock captured along with the wall clock readings. A jump should manifest itself in a big difference between the wall clock delta and the corresponding CLOCK_MONOTONIC_RAW delta. If such a condition is detected, then HybridClock::NowWithErrorUnlocked() dumps diagnostic information about clock NTP synchronisation status and returns Status::ServiceUnavailable() with appropriate error message.

As a part of this changelist, the following new flags are introduced:

--wall_clock_jump_detection
This is to control the newly introduced sanity check for readings of the wall clock. Acceptable values are auto, enabled, and disabled. It is set to auto by default, which means that the sanity check for timestamps is enabled if the process detects that it is running on a VM in Azure cloud.
--wall_clock_jump_threshold_sec
This is to control the threshold (in seconds) for the difference in deltas of the wall clock's and CLOCK_MONOTONIC_RAW clock's readings. It is set to 900 (15 minutes) by default.

Kudu multi-master config change

You can now remove or decommission the unwanted master role instances through Cloudera Manager. Also, you can recommission any decommissioned master role instance in a multi-master deployment. For more information, see Remove Kudu masters through Cloudera Manager.

Kudu Range-aware Data Placement

Kudu places new tablet replicas using an algorithm which is both range and table aware. This algorithm helps to avoid hotspotting that occurs if many replicas from the same range are placed on the same few tablet servers. Hotspotting causes tablet servers to be overwhelmed with write or read requests and can result in increased latency for these requests. To avoid hotspotting, this algorithm avoids targeting the same set of tablet servers for a set of replicas created in parallel. Rather, it spreads the replicas across multiple tablet servers. For more information, see Range-aware replica placement in Kudu.

Kudu replication factor

A new tool, kudu table set_replication_factor, is added to dynamically modify the replication factor of an existing table. You can now adjust the replication factor for existing tables without needing to recreate them, providing more flexibility in managing how data is replicated across your cluster. The tool immediately updates the table metadata in the master, and the master asynchronously applies the new replication factor. You can monitor the progress by running the ksck command.