What's New in Apache Kudu

Learn about the new features of Kudu in Cloudera Runtime 7.2.2.

Table ownership

Kudu supports table ownership and can use the ownership to enforce owner authorization policies using Ranger. Ranger supports ownership privilege by creating a default policy that allows {OWNER} of a resource to access it without creating additional policy manually. A new access type, “delegate admin”, is added which grants the user permission to change ownership and create a table with a different owner.

KuduScanner.GetKuduTable method

The KuduTable instance is available from the KuduScanner, allowing the user to use the KuduTable instance populated by the scan token instead of making a GetTableSchema call to the master. The complete KuduTable is passed to the KuduScanner in the constructor, so it can be re-used by the consumer of the scanner. This eliminates additional roundtrip from client to master and the spikes of GetTableSchema requests to master whenever an Impala query or a Spark job starts on a large cluster.

Optimizations and improvements

  • Adding kudu.snapshotTimestampMicros as an optional property to kudu spark readOptions. It allows consistaet scans when timestamp is set before the first dataFrame read.
  • The overall RPC performance increased by allocating larger chunks of memory from protobuf::Arena for all incoming calls. The boost in request rate is noticeable for RPCs with larger request and responses (for example, up to 25% increase is shown in synthetic benchmarks for GetTabletLocations), but not so much for RPCs with smaller requests and responses (for example, no measurable gain in request rate for GetTableSchema).
  • A load meter is introduced for ThreadPool, aiming to use active queue management techniques (AQM) such as CoDel in scenarios where thread pool queue load metrics are applicable.