Troubleshooting Apache Kudu The basic Apache Kudu troubleshooting information is covered here. For more details, see the official Kudu documentation for troubleshooting. Issues starting or restarting the master or the tablet serverYou may face issues while starting or restarting the master or the tablet server in case there are errors in the hole punching tests, or if the FS layout already exists, or if the master and tablet server's clocks are not synchronized using NTP.Disk space usageWhen using the log block manager (the default on Linux), Kudu uses sparse files to store data. A sparse file has a different apparent size than the actual amount of disk space it uses. This means that some tools may inaccurately report the disk space used by Kudu. For example, the size listed by ls -l does not accurately reflect the disk space used by Kudu data files: Reporting Kudu crashes using breakpadKudu uses the Google breakpad library to generate a minidump whenever Kudu experiences a crash. A minidump file contains important debugging information about the process that crashed, including shared libraries loaded and their versions, a list of threads running at the time of the crash, the state of the processor registers and a copy of the stack memory for each thread, and CPU and operating system version information. These minidumps are typically only a few MB in size and are generated even if core dump generation is disabled. Currently, generating minidumps is only possible on Linux deployments.Troubleshooting performance issuesThis topic helps you to troubleshoot issues and improve performance using Kudu tracing, memory limits, block size cache, heap sampling, and name service cache daemon (nscd).Usability issuesThis topic lists some common exceptions and errors that you may encounter while using Kudu and helps you to resolve issues related to usability.Tombstoned or STOPPED tablet replicasYou may notice some replicas on a tablet server are in a STOPPED state and remain on the server indefinitely. These replicas are tombstones. A tombstone indicates that the tablet server once held a bona fide replica of its tablet.Corruption: checksum error on CFile blockIn versions prior to Kudu 1.8.0, if the data on disk becomes corrupt, you will encounter warnings containing "Corruption: checksum error on CFile block" in the tablet server logs and client side errors when trying to scan tablets with corrupt CFile blocks. Fixing this corruption is a manual process. Generating a table listTo generate a list of tables to backup using the kudu table list tool along with grep can be useful.Spark tuningIn general the Spark jobs were designed to run with minimal tuning and configuration. You can adjust the number of executors and resources to increase parallelism and performance using Spark’s configuration options.Symbolizing stack tracesThis topic helps you to identify whether there is a high contention among the threads to acquire a lock and a way to symbolize stack addresses.