What's New in Apache Kudu

Learn about the new features of Kudu in Cloudera Runtime 7.1.7.

Simplified multi-master management

Kudu supports Raft configuration change for Kudu masters and CLI tools for orchestrating addition, removal and recovery of masters in a Kudu cluster. These tools substantially simplify the process of migrating to multiple masters, recovering a dead master and removing masters from a Kudu cluster. See KUDU-2181 for details.

Hive Metastore integration

Kudu can integrate its own catalog with the Hive Metastore (HMS). The HMS is the de-facto standard catalog and metadata provider in the Hadoop ecosystem. When the HMS integration is enabled, Kudu tables can be discovered and used by external HMS-aware tools, even if they are not otherwise aware of, or integrated with Kudu. Kudu supports table comments directly on Kudu tables which are automatically synchronized when the Hive Metastore integration is enabled. These comments can be added at table creation time and changed via table alteration.

For more information, see Using Hive Metastore with Apache Kudu.

Optimizations and improvements

  • It is now possible to change the Kerberos Service Principal Name using the --principal flag. The default SPN is still kudu/_HOST. Clients connecting to a cluster using a non-default SPN must set the sasl_protocol_name or saslProtocolName to match the SPN base (meaning “kudu” if the SPN is “kudu/_HOST”) in the client builder or the Kudu CLI. For more information see KUDU-1884 and Configuring custom Kerberos principal for Kudu.

  • Kudu RPC now supports TLSv1.3. Kudu servers and clients automatically negotiate TLSv1.3 for Kudu RPC if OpenSSL (or Java runtime correspondingly) on each side supports TLSv1.3. If necessary, use the newly introduced flag --rpc_tls_ciphersuites to customize TLSv1.3-specific cipher suites at the server side. See KUDU-2871 for details.
  • TLS ciphers renegotiation for TLSv1.2 and prior protocol versions is now explicitly disabled. See KUDU-1926 for details.
  • The location assignment for Kudu clients is now disabled by default since it doesn’t bring a lot of benefits, but rather puts an extra load on Kudu masters. This change reduces the load on Kudu masters which is essential if too many clients run in a cluster. To enable the location assignment for clients, override the default by setting --master_client_location_assignment_enabled=true for Kudu masters.
  • The behavior of the C++ client replica selection for closest replica, the default, was updated to match the behavior of the Java client. Instead of picking a random replica each time, a static value is used for each process ensuring that the selection remains deterministic and can benefit from better caching. See KUDU-3248 for details.
  • The Web UI /rpcz endpoint now displays information on whether an RPC connection is protected by TLS, and if so, provides information on the negotiated TLS cipher suite.
  • Tooling requests and C++ client requests bound for leader masters will now be retried in the event the masters cannot be reached.
  • Cluster tooling will now validate that the master argument contains no duplicate values. See KUDU-3226 for details.
  • The error message output by Kudu Java client in an attempt to write into a non-existent table partition now contains the table’s name. See KUDU-3267 for details.