Fixed Issues in Apache Kudu

Review the list of fixed issues for Kudu in Cloudera Runtime 7.1.7 SP2.

CDPD-47068: Updated default value for --tablet_history_max_age_sec to avoid OOM for kudu-master
Fixed an issue with the kudu-master process consuming too much memory in case of very large clusters, clusters with many thousands of tables, or clusters with huge numbers of DDL operations per day.
CDPD-46131: Fixed table creation with HMS Integration
The issue manifested itself when Kudu HMS integration was enabled and a table was created through a "stored as kudu table" query on Impala from Hive. Any subsequent query through Hive failed with a ClassNotFoundError as the Kudu HMS client would not send Hive some necessary fields in the create table request.
CDPD-45355: Fixed multiple DNS related issues
One of these issues revolves around a change of addresses at runtime and is fixed by refreshing DNS entries if proxies hit a network error. Another issue is fixed by allowing the reuse of outbound request buffers when retrying.
CDPD-44917: Fix UB in TxnSystemClient when adding max timeout to now
Fixed a UB issue in TxnSystemClient by passing deadlines instead of timeouts.
This issue manifested itself when a max timeout was to be added.
CDPD-44835: Fix thirdparty build issues on Ubuntu 21.10
Fixed third party build issues on ubuntu 21.10.
Multiple issues lead to llvm build failures. New patch files were necessary to fix these errors. One error was that the linux kernel removed the interface to cyclades which led to a llvm build failure.
CDPD-44833: Fix a scan bug that reads repetitive rows
Fixed a scanner bug that would read repetitive rows.
The bug would manifest itself when isFaultTolerant is true as lastPrimaryKey would not be updated as part of the second scan request. In a common scenario when the tablet server hosting the leader replica restarts, scanners will read the rows from the first ScanResponse's lastPrimaryKey which would return repetitive rows.
CDPD-44826: Fix prefetching bug in Java scanner
Fixed a prefetching bug in the Java scanner.
This bug manifested itself when the scanner would prefetch the value too early and it would override the value. The fix is to use an atomic value to cache the value so the data won't be overridden.
CDPD-44793: Java client does not properly update master locations cache
Fixed a bug in Kudu Java client where it could not invalidate stale locations of a former leader master.
The bug manifested itself when a master node had become unreachable due to network issues and the client didn't receive RST on the connection to the master node. The client would keep trying to connect to the unreachable leader master but could not receive response until RPC timeout. Even when the master node was reachable again, the client would still send RPCs through the old TCP connection and could not connect to the server and the new leader master. The only way out was to restart the client application.
CDPD-44788: Stop sending DeleteTablet RPC to wrong tablet server
Kudu master no longer retries DeleteTablet RPC on tablet servers once the RPC is responded with WRONG_SERVER_UUID.
CDPD-42695: Back-port range-aware kudu cluster rebalance tool into 7.1.7 SP2
The kudu cluster rebalance CLI tool has been improved to detect and fix the hot-spotting issue for particular tables.

The earlier algorithm to place tablet replicas for a newly created table in Kudu catalog manager is prone to hot-spotting if the table is partitioned simultaneously by range and hash. That is because the algorithm does not discriminate based on the tablet's key range. With that, all tablets of a table look the same for the algorithm, and that could lead to hot-spotting if many tablet replicas from the same range (but different hash buckets) are placed at the same tablet server.

The kudu cluster rebalance tool prior to the introduction of range-aware rebalancing could not detect and fix the hot-spotting issue because it did not discriminate tablet replicas based on the tablet's key range either. So, even if the distribution of replicas is ideally balanced, there might be a hot-spotting due to the reasons cited above even after running kudu cluster rebalance tool of prior versions.

Apache patch information

  • None