What's New in Apache Kudu
Learn about the new features of Kudu in Cloudera Runtime 7.1.6.
Optimizations and improvements
- Downloading the WAL data and data blocks when copying tablets to another tablet server is now parallelized, resulting in much faster tablet copy operations. These operations occur when recovering from a down tablet server or when running the cluster rebalancer. See KUDU-1728 and KUDU-3214 for more details.
- The HMS integration now supports multiple Kudu clusters associated with a single HMS including Kudu clusters that do not have HMS synchronization enabled. This is possible, because the Kudu master will now leverage the cluster Id to ignore notifications from tables in a different cluster. Additionally, the HMS plugin will check if the Kudu cluster associated with a table has HMS synchronization enabled. See KUDU-3192 and KUDU-3187 for more details.
- Kudu will now fail tablet replicas that have been corrupted due to KUDU-2233 instead of crashing the tablet server. If a healthy majority still exists, a new replica will be created and the failed replica will be evicted and deleted. See KUDU-3191 and KUDU-2233 for more details.
- DeltaMemStores will now be flushed as long as any DMS in a tablet is older than the
point defined by
--flush_threshold_secs, rather than flushing once every
--flush_threshold_secsperiod. This can reduce memory pressure under update- or delete-heavy workloads, and lower tablet server restart times following such workloads. See KUDU-3195 for more details.
kudu perf loadgenCLI tool now supports
UPSERTfor storing the generated data into the table. To switch to
UPSERTfor row operations (instead of default
INSERT), add the
- Users can now specify the level of parallelization when copying a tablet using the
kudu local_replica copy_from_remoteCLI tool by passing the
- The Kudu Masters now discriminate between overlapped and exact duplicate key ranges
when adding new partitions, returning
Status::AlreadyPresent()for exact range duplicates and
Status::InvalidArgument()for otherwise overlapped ones. In prior releases, the master returned
Status::InvalidArgument()both in case of duplicate and otherwise overlapped ranges.
- The handling of an empty list of master addresses in Kudu C++ client has improved. In
KuduClientBuilder::Build()would hang in
ConnectToCluster()if no master addresses were provided. Now,
Status::InvalidArgument()in such a case.
- The connection negotiation timeout for Kudu C client is now programmatically configurable. To customize the connection negotiation timeout, use the newly introduced `KuduClientBuilder::connection_negotiation_timeout()` method in the Kudu C client API.
- All RPC-related
kuduCLI tools now have
--negotiation_timeout_mscommand line flag to control the client-side connection negotiation timeout. The default value for the new flag is set to 3000 milliseconds for backward compatibility. Keep in mind that the total RPC timeout includes the connection negotiation time, so in general it makes sense to bump
--negotiation_timeout_msby the same delta.
- Kudu now reports on slow SASL calls (i.e. calls taking more than 250 milliseconds to complete) when connecting to a server. This is to help diagnose issues like described in KUDU-3217.
- MaintenanceManager now has a new histogram-based
maintenance_op_find_best_candidate_durationmetric to capture the stats on how long it takes (in microseconds) to find the best maintenance operation among available candidates. The newly introduced metric can help in diagnosing conditions where MaintenanceManager seems lagging behind the rate of write operations in a busy Kudu cluster with many replicas per tablet server.
- The KuduScanToken Java API has been extended with a
deserializeIntoScannerBuilder()method that can be used to further customize generated tokens.
- Logging of the error message produced when applying an op while a Java KuduSession is closed has been throttled. See KUDU-3012 for more details.
- Added a new
uptimemetric for a Kudu server. The metric’s value is reported as the length of the time interval passed from the start of the server, in microseconds. Knowing the server’s uptime, it’s easier to interpret and compare metrics reported by different Kudu servers.
- When pruning in-list predicate values, range partition can also be taken into consideration. See KUDU-1644