Apache Kudu Administration
This topic describes how to perform common administrative tasks and workflows with Apache Kudu.
Continue reading:
- Starting and Stopping Kudu Processes
- Kudu Web Interfaces
- Kudu Metrics
- Common Kudu Workflows
- Migrating to Multiple Kudu Masters
- Recovering from a Dead Kudu Master in a Multi-Master Deployment
- Removing Kudu Masters from a Multi-Master Deployment
- Changing Master Hostnames
- Monitoring Cluster Health with ksck
- Changing Directory Configuration
- Recovering from Disk Failure
- Recovering from Full Disks
- Bringing a Tablet That Has Lost a Majority of Replicas Back Online
- Rebuilding a Kudu Filesystem Layout
- Physical Backups of an Entire Node
- Scaling Storage on Kudu Master and Tablet Servers in the Cloud
- Migrating Kudu Data from One Directory to Another on the Same Host
- Minimizing Cluster Disruption During Temporary Planned Downtime of a Single Tablet Server
- Running Tablet Rebalancing Tool
- Decommissioning or Permanently Removing a Tablet Server From a Cluster
Starting and Stopping Kudu Processes
sudo service kudu-master start sudo service kudu-tserver start
sudo service kudu-master stop sudo service kudu-tserver stop
sudo chkconfig kudu-master on # RHEL / CentOS sudo chkconfig kudu-tserver on # RHEL / CentOS sudo update-rc.d kudu-master defaults # Ubuntu sudo update-rc.d kudu-tserver defaults # Ubuntu
Kudu Web Interfaces
Kudu tablet servers and masters expose useful operational information on a built-in web interface.
Kudu Master Web Interface
Kudu master processes serve their web interface on port 8051. The interface exposes several pages with information about the state of the cluster.
-
A list of tablet servers, their host names, and the time of their last heartbeat.
-
A list of tables, including schema and tablet location information for each.
-
SQL code which you can paste into Impala Shell to add an existing table to Impala’s list of known data sources.
Kudu Tablet Server Web Interface
Each tablet server serves a web interface on port 8050. The interface exposes information about each tablet hosted on the server, its current state, and debugging information about maintenance background operations.
Common Web Interface Pages
Both Kudu masters and tablet servers expose the following information via their web interfaces:
-
HTTP access to server logs.
-
An /rpcz endpoint which lists currently running RPCs via JSON.
-
Details about the memory usage of different components of the process.
-
The current set of configuration flags.
-
Currently running threads and their resource consumption.
-
A JSON endpoint exposing metrics about the server.
-
The version number of the daemon deployed on the cluster.
These interfaces are linked from the landing page of each daemon’s web UI.
Kudu Metrics
Kudu daemons expose a large number of metrics. Some metrics are associated with an entire server process, whereas others are associated with a particular tablet replica.
Listing Available Metrics
The full set of available metrics for a Kudu server can be dumped using a special command line flag:
$ kudu-tserver --dump_metrics_json $ kudu-master --dump_metrics_json
This will output a large JSON document. Each metric indicates its name, label, description, units, and type. Because the output is JSON-formatted, this information can easily be parsed and fed into other tooling which collects metrics from Kudu servers.
For the complete list of metrics collected by Cloudera Manager for a Kudu service, look for the Kudu metrics listed under Cloudera Manager Metrics .
Collecting Metrics via HTTP
Metrics can be collected from a server process via its HTTP interface by visiting /metrics. The output of this page is JSON for easy parsing by monitoring services. This endpoint accepts several GET parameters in its query string:
-
/metrics?metrics=<substring1>,<substring2>,… - Limits the returned metrics to those which contain at least one of the provided substrings. The substrings also match entity names, so this may be used to collect metrics for a specific tablet.
-
/metrics?include_schema=1 - Includes metrics schema information such as unit, description, and label in the JSON output. This information is typically omitted to save space.
-
/metrics?compact=1 - Eliminates unnecessary whitespace from the resulting JSON, which can decrease bandwidth when fetching this page from a remote host.
-
/metrics?include_raw_histograms=1 - Include the raw buckets and values for histogram metrics, enabling accurate aggregation of percentile metrics over time and across hosts.
For example:
$ curl -s 'http://example-ts:8050/metrics?include_schema=1&metrics=connections_accepted'
[ { "type": "server", "id": "kudu.tabletserver", "attributes": {}, "metrics": [ { "name": "rpc_connections_accepted", "label": "RPC Connections Accepted", "type": "counter", "unit": "connections", "description": "Number of incoming TCP connections made to the RPC server", "value": 92 } ] } ]
$ curl -s 'http://example-ts:8050/metrics?metrics=log_append_latency'
[ { "type": "tablet", "id": "c0ebf9fef1b847e2a83c7bd35c2056b1", "attributes": { "table_name": "lineitem", "partition": "hash buckets: (55), range: [(<start>), (<end>))", "table_id": "" }, "metrics": [ { "name": "log_append_latency", "total_count": 7498, "min": 4, "mean": 69.3649, "percentile_75": 29, "percentile_95": 38, "percentile_99": 45, "percentile_99_9": 95, "percentile_99_99": 167, "max": 367244, "total_sum": 520098 } ] } ]
Diagnostics Logging
Kudu may be configured to periodically dump all of its metrics to a local log file using the --metrics_log_interval_msflag. Set this flag to the interval at which metrics should be written to a diagnostics log file.
The diagnostics log will be written to the same directory as the other Kudu log files, with a similar naming format, substituting diagnostics instead of a log level like INFO. After any diagnostics log file reaches 64MB uncompressed, the log will be rolled and the previous file will be gzip-compressed.
The log file generated has three space-separated fields. The first field is the word metrics. The second field is the current timestamp in microseconds since the Unix epoch. The third is the current value of all metrics on the server, using a compact JSON encoding. The encoding is the same as the metrics fetched via HTTP described above.
Common Kudu Workflows
Migrating to Multiple Kudu Masters
To provide high availability and to avoid a single point of failure, Kudu clusters should be created with multiple masters. Many Kudu clusters were created with just a single master, either for simplicity or because Kudu multi-master support was still experimental at the time. This workflow demonstrates how to migrate to a multi-master configuration. It can also be used to migrate from two masters to three with straightforward modifications.
Prepare for the migration
- Establish a maintenance window (one hour should be sufficient). During this time the Kudu cluster will be unavailable.
- Decide how many masters to use. The number of masters should be odd. Three or five node master configurations are recommended; they can tolerate one or two failures respectively.
- Perform the following preparatory steps for the existing master:
-
Identify and record the directories where the master’s write-ahead log (WAL) and data live. If using Kudu system packages, their default locations are /var/lib/kudu/master, but they may be customized using the fs_wal_dir and fs_data_dirs configuration parameters. The command below assume that fs_wal_dir is /data/kudu/master/wal and fs_data_dirs is /data/kudu/master/data. Your configuration may differ. For more information on configuring these directories, see the Kudu Configuration docs.
-
Identify and record the port the master is using for RPCs. The default port value is 7051, but it may have been customized using the rpc_bind_addresses configuration parameter.
-
Identify the master’s UUID. It can be fetched using the following command:
$ sudo -u kudu kudu fs dump uuid --fs_wal_dir=<master_wal_dir> [--fs_data_dirs=<master_data_dir>] 2>/dev/null
- master_data_dir
-
The location of the existing master’s previously recorded data directory.
For example:$ sudo -u kudu kudu fs dump uuid --fs_wal_dir=/var/lib/kudu/master 2>/dev/null 4aab798a69e94fab8d77069edff28ce0
-
(Optional) Configure a DNS alias for the master. The alias could be a DNS cname (if the machine already has an A record in DNS), an A record (if the machine is only known by its IP address), or an alias in /etc/hosts. The alias should be an abstract representation of the master (e.g. master-1).
-
- If you have Kudu tables that are accessed from Impala, you must update the master addresses in the Apache Hive Metastore (HMS) database.
- If you set up the DNS aliases, run the following statement in impala-shell, replacing master-1, master-2, and master-3 with your actual aliases.
ALTER TABLE table_name SET TBLPROPERTIES ('kudu.master_addresses' = 'master-1,master-2,master-3');
- If you do not have DNS aliases set up, see Step #11 in the Performing the migration section for updating HMS.
- If you set up the DNS aliases, run the following statement in impala-shell, replacing master-1, master-2, and master-3 with your actual aliases.
- Perform the following preparatory steps for each new master:
-
Choose an unused machine in the cluster. The master generates very little load so it can be collocated with other data services or load-generating processes, though not with another Kudu master from the same configuration.
-
Ensure Kudu is installed on the machine, either using system packages (in which case the kudu and kudu-master packages should be installed), or some other means.
-
Choose and record the directory where the master’s data will live.
-
Choose and record the port the master should use for RPCs.
-
(Optional) Configure a DNS alias for the master (e.g. master-2, master-3, etc).
-
Perform the migration
- Stop all the Kudu processes in the entire cluster.
- Format the data directory on each new master machine, and record the generated UUID. Use the following commands:
$ sudo -u kudu kudu fs format --fs_wal_dir=<master_wal_dir> [--fs_data_dirs=<master_data_dir>] $ sudo -u kudu kudu fs dump uuid --fs_wal_dir=<master_wal_dir> [--fs_data_dirs=<master_data_dir>] 2>/dev/null
- master_data_dir
-
The new master’s previously recorded data directory.
For example:$ sudo -u kudu kudu fs format --fs_wal_dir=/data/kudu/master/wal --fs_data_dirs=/data/kudu/master/data sudo -u kudu kudu fs dump uuid --fs_wal_dir=/data/kudu/master/wal --fs_data_dirs=/data/kudu/master/data 2>/dev/null f5624e05f40649b79a757629a69d061e
- If you are using Cloudera Manager, add the new Kudu master roles now, but do not start them.
-
If using DNS aliases, override the empty value of the Master Address parameter for each role (including the existing master role) with that master’s alias.
-
Add the port number (separated by a colon) if using a non-default RPC port value.
-
- Rewrite the master’s Raft configuration with the following command, executed on the existing master:
$ sudo -u kudu kudu local_replica cmeta rewrite_raft_config --fs_wal_dir=<master_wal_dir> [--fs_data_dirs=<master_data_dir>] <tablet_id> <all_masters>
- master_data_dir
-
The existing master’s previously recorded data directory
- tablet_id
-
This must be set to the string, 00000000000000000000000000000000.
- all_masters
-
A space-separated list of masters, both new and existing. Each entry in the list must be a string of the form <uuid>:<hostname>:<port>.
- uuid
-
The master’s previously recorded UUID.
- hostname
-
The master’s previously recorded hostname or alias.
- port
-
The master’s previously recorded RPC port number.
- For example:
-
$ sudo -u kudu kudu local_replica cmeta rewrite_raft_config --fs_wal_dir=/data/kudu/master/wal --fs_data_dirs=/data/kudu/master/data 00000000000000000000000000000000 4aab798a69e94fab8d77069edff28ce0:master-1:7051 f5624e05f40649b79a757629a69d061e:master-2:7051 988d8ac6530f426cbe180be5ba52033d:master-3:7051
- Modify the value of the master_addresses configuration parameter for both existing master and new masters. The new value must be a comma-separated list
of all of the masters. Each entry is a string of the form, <hostname>:<port>.
- hostname
-
The master's previously recorded hostname or alias.
- port
-
The master's previously recorded RPC port number.
- Start the existing master.
- Copy the master data to each new master with the following command, executed on each new master machine.
$ sudo -u kudu kudu local_replica copy_from_remote --fs_wal_dir=<master_data_dir> <tablet_id> <existing_master>
- master_data_dir
-
The new master's previously recorded data directory.
- tablet_id
-
Must be set to the string, 00000000000000000000000000000000.
- existing_master
-
RPC address of the existing master. It must be a string of the form <hostname>:<port>.
- hostname
-
The existing master's previously recorded hostname or alias.
- port
-
The existing master's previously recorded RPC port number.
- Example
-
$ sudo -u kudu kudu local_replica copy_from_remote --fs_wal_dir=/data/kudu/master/wal --fs_data_dirs=/data/kudu/master/data 00000000000000000000000000000000 master-1:7051
- Start all the new masters.
- Modify the value of the tserver_master_addrs configuration parameter for each tablet server. The new value must be a comma-separated list of masters
where each entry is a string of the form <hostname>:<port>
- hostname
-
The master's previously recorded hostname or alias
- port
-
The master's previously recorded RPC port number
- Start all the tablet servers.
- If you have Kudu tables that are accessed from Impala and you didn’t set up DNS aliases, update the HMS database manually in the underlying database that provides the storage for HMS.
- The following is an example SQL statement you would run in the HMS database:
UPDATE TABLE_PARAMS SET PARAM_VALUE = 'master-1.example.com,master-2.example.com,master-3.example.com' WHERE PARAM_KEY = 'kudu.master_addresses' AND PARAM_VALUE = 'old-master';
- Invalidate the metadata by running the command in impala-shell:
INVALIDATE METADATA;
- The following is an example SQL statement you would run in the HMS database:
-
Using a browser, visit each master’s web UI and navigate to the /masters page. All the masters should now be listed there with one master in the LEADER role and the others in the FOLLOWER role. The contents of /masters on each master should be the same.
-
Run a Kudu system check (ksck) on the cluster using the kudu command line tool. For more details, see Monitoring Cluster Health with ksck.
Recovering from a Dead Kudu Master in a Multi-Master Deployment
Kudu multi-master deployments function normally in the event of a master loss. However, it is important to replace the dead master. Otherwise a second failure may lead to a loss of availability, depending on the number of available masters. This workflow describes how to replace the dead master.
Due to KUDU-1620, it is not possible to perform this workflow without also restarting the live masters. As such, the workflow requires a maintenance window, albeit a potentially brief one if the cluster was set up with DNS aliases.
Prepare for the recovery
- If the cluster was configured without DNS aliases perform the following steps. Otherwise move on to step 2:
- Establish a maintenance window (one hour should be sufficient). During this time the Kudu cluster will be unavailable.
- Shut down all Kudu tablet server processes in the cluster.
- Ensure that the dead master is well and truly dead. Take whatever steps needed to prevent it from accidentally restarting; this can be quite dangerous for the cluster post-recovery.
- Choose one of the remaining live masters to serve as a basis for recovery. The rest of this workflow will refer to this master as the "reference" master.
- Choose an unused machine in the cluster where the new master will live. The master generates very little load so it can be co-located with other data services or load-generating processes, though not with another Kudu master from the same configuration. The rest of this workflow will refer to this master as the "replacement" master.
- Perform the following preparatory steps for the replacement master:
-
Ensure Kudu is installed on the machine, either via system packages (in which case the kudu and kudu-master packages should be installed), or via some other means.
-
Choose and record the directory where the master’s data will live.
-
- Perform the following preparatory steps for each live master:
-
Identify and record the directory where the master’s data lives. If using Kudu system packages, the default value is /var/lib/kudu/master, but it may be customized via the fs_wal_dir and fs_data_dirs configuration parameter. Please note if you’ve set fs_data_dirs to some directories other than the value of fs_wal_dir, it should be explicitly included in every command below where fs_wal_dir is also included. For more information on configuring these directories, see the Kudu Configuration docs.
-
Identify and record the master’s UUID. It can be fetched using the following command:
$ sudo -u kudu kudu fs dump uuid --fs_wal_dir=<master_wal_dir> [--fs_data_dirs=<master_data_dir>] 2>/dev/null
- master_data_dir
-
live master’s previously recorded data directory
- Example
-
$ sudo -u kudu kudu fs dump uuid --fs_wal_dir=/data/kudu/master/wal --fs_data_dirs=/data/kudu/master/data 2>/dev/null 80a82c4b8a9f4c819bab744927ad765c
-
- Perform the following preparatory steps for the reference master:
-
Identify and record the directory where the master’s data lives. If using Kudu system packages, the default value is /var/lib/kudu/master, but it may be customized using the fs_wal_dir and fs_data_dirs configuration parameter. If you have set fs_data_dirs to some directories other than the value of fs_wal_dir, it should be explicitly included in every command below where fs_wal_dir is also included. For more information on configuring these directories, see the Kudu Configuration docs.
-
Identify and record the UUIDs of every master in the cluster, using the following command:
$ sudo -u kudu kudu local_replica cmeta print_replica_uuids --fs_wal_dir=<master_data_dir> <tablet_id> 2>/dev/null
- master_data_dir
-
The reference master’s previously recorded data directory.
- tablet_id
-
Must be set to the string, 00000000000000000000000000000000.
- For example
-
$ sudo -u kudu kudu local_replica cmeta print_replica_uuids --fs_wal_dir=/data/kudu/master/wal --fs_data_dirs=/data/kudu/master/data 00000000000000000000000000000000 2>/dev/null 80a82c4b8a9f4c819bab744927ad765c 2a73eeee5d47413981d9a1c637cce170 1c3f3094256347528d02ec107466aef3
-
- Using the two previously-recorded lists of UUIDs (one for all live masters and one for all masters), determine and record (by process of elimination) the UUID of the dead master.
Perform the recovery
- Format the data directory on the replacement master machine using the previously recorded UUID of the dead master. Use the following command sequence:
$ sudo -u kudu kudu fs format --fs_wal_dir=<master_wal_dir> [--fs_data_dirs=<master_data_dir>] --uuid=<uuid>
- master_data_dir
-
The replacement master’s previously recorded data directory.
- uuid
-
The dead master’s previously recorded UUID.
- For example:
-
$ sudo -u kudu kudu fs format --fs_wal_dir=/data/kudu/master/wal --fs_data_dirs=/data/kudu/master/data --uuid=80a82c4b8a9f4c819bab744927ad765c
- Copy the master data to the replacement master with the following command.
$ sudo -u kudu kudu local_replica copy_from_remote --fs_wal_dir=<master_wal_dir> [--fs_data_dirs=<master_data_dir>] <tablet_id> <reference_master>
- master_data_dir
-
The replacement master’s previously recorded data directory.
- tablet_id
-
Must be set to the string, 00000000000000000000000000000000.
- reference_master
-
The RPC address of the reference master. It must be a string of the form <hostname>:<port>.
- hostname
-
The reference master’s previously recorded hostname or alias.
- port
-
The reference master’s previously recorded RPC port number.
- For example:
-
$ sudo -u kudu kudu local_replica copy_from_remote --fs_wal_dir=/data/kudu/master/wal --fs_data_dirs=/data/kudu/master/data 00000000000000000000000000000000 master-2:7051
- If you are using Cloudera Manager, add the replacement Kudu master role now, but do not start it.
-
Override the empty value of the Master Address parameter for the new role with the replacement master’s alias.
-
If you are using a non-default RPC port, add the port number (separated by a colon) as well.
-
- If the cluster was set up with DNS aliases, reconfigure the DNS alias for the dead master to point at the replacement master.
- If the cluster was set up without DNS aliases, perform the following steps:
- Stop the remaining live masters.
- Rewrite the Raft configurations on these masters to include the replacement master. See Step 4 of Perform the Migration for more details.
- Start the replacement master.
- Restart the remaining masters in the new multi-master deployment. While the masters are shut down, there will be an availability outage, but it should last only as long as it takes for the masters to come back up.
-
Using a browser, visit each master’s web UI and navigate to the /masters page. All the masters should now be listed there with one master in the LEADER role and the others in the FOLLOWER role. The contents of /masters on each master should be the same.
-
Run a Kudu system check (ksck) on the cluster using the kudu command line tool. For more details, see Monitoring Cluster Health with ksck.
Removing Kudu Masters from a Multi-Master Deployment
Prepare for removal
- Establish a maintenance window (one hour should be sufficient). During this time the Kudu cluster will be unavailable.
- Identify the UUID and RPC address current leader of the multi-master deployment by visiting the /masters page of any master’s web UI. This master must not be removed during this process; its removal may result in severe data loss.
- Stop all the Kudu processes in the entire cluster.
- If you are using Cloudera Manager, remove the unwanted Kudu master from your cluster's Kudu service.
Perform the removal
- Rewrite the Raft configuration on the remaining masters to include only the remaining masters. See Step 4 of Perform the Migration for more details.
- Remove the data directories and WAL directory on the unwanted masters. This is a precaution to ensure that they cannot start up again and interfere with the new multi-master deployment.
- Modify the value of the master_addresses configuration parameter for the masters of the new multi-master deployment. See Kudu Configuration docs for the steps to modify a configuration parameter. If migrating to a single-master deployment, the master_addresses flag should be omitted entirely.
- Start all of the masters that were not removed.
- Modify the value of the tserver_master_addrs configuration parameter for the tablet servers to remove any unwanted masters. See Kudu Configuration docs for the steps to modify a configuration parameter.
- Start all of the tablet servers.
-
Using a browser, visit each master’s web UI and navigate to the /masters page. All the masters should now be listed there with one master in the LEADER role and the others in the FOLLOWER role. The contents of /masters on each master should be the same.
-
Run a Kudu system check (ksck) on the cluster using the kudu command line tool. For more details, see Monitoring Cluster Health with ksck.
Changing Master Hostnames
When replacing dead masters, use DNS aliases to prevent long maintenance windows. If the cluster was set up without aliases, change the host names as described in this section.
Prepare for Hostname Changes
To prepare to change a hostname:
- Establish a maintenance window during which the Kudu cluster will be unavailable. One hour should be sufficient.
- On the Masters page in Kudu Web UI, note the UUID and RPC address of each master.
- Stop all the Kudu processes in the cluster.
- Set up the new hostnames to point to the masters and verify all servers and clients properly resolve them.
Perform Hostname Changes
To change hostnames:
- Rewrite each master’s Raft configuration with the following command, executed on each master host:
$ sudo -u kudu kudu local_replica cmeta rewrite_raft_config --fs_wal_dir=<master_wal_dir> [--fs_data_dirs=<master_data_dir>] 00000000000000000000000000000000 <all_masters>
For example:
$ sudo -u kudu kudu local_replica cmeta rewrite_raft_config --fs_wal_dir=/data/kudu/master/wal --fs_data_dirs=/data/kudu/master/data 00000000000000000000000000000000 4aab798a69e94fab8d77069edff28ce0:new-master-name-1:7051 f5624e05f40649b79a757629a69d061e:new-master-name-2:7051 988d8ac6530f426cbe180be5ba52033d:new-master-name-3:7051
- Update the master address:
- In an environment not managed by Cloudera Manager, change the gflag file of the masters so the master_addresses parameter reflects the new hostnames.
- In an environment managed by Cloudera Manager, specify the new hostname in the Master Address (server.address) field on each Kudu role.
- Change the gflag file of the tablet servers to update the tserver_master_addrs parameter with the new hostnames. In an environment managed by Cloudera Manager, this step is not needeed.
- Start the masters.
- To verify that all masters are working properly, perform the following sanity checks:
- In each master’s Web UI, click Masters on the Status Pages. All of the masters should be listed there with one master in the LEADER role field and the others in the FOLLOWER role field. The contents of Masters on all master should be the same.
- Run the below command to verify all masters are up and listening. The UUIDs are the same and belong to the same master as before the hostname change:
$ sudo -u kudu kudu master list new-master-name-1:7051,new-master-name-2:7051,new-master-name-3:7051
- Start all of the tablet servers.
- Run a Kudu system check (ksck) on the cluster using the kudu command line tool. See Monitoring Cluster Health with ksck for more details. After startup, some tablets may be unavailable as it takes some time to initialize all of them.
- If you have Kudu tables that are accessed from Impala, update the HMS database manually in the underlying database that provides the storage for HMS.
- The following is an example SQL statement you run in the HMS database:
UPDATE TABLE_PARAMSSET PARAM_VALUE = 'new-master-name-1:7051,new-master-name-2:7051,new-master-name-3:7051' WHERE PARAM_KEY = 'kudu.master_addresses' AND PARAM_VALUE = 'master-1:7051,master-2:7051,master-3:7051';
- In impala-shell, run:
INVALIDATE METADATA;
- Verify updating the metadata worked by running a simple SELECT query on a Kudu-backed Impala table.
- The following is an example SQL statement you run in the HMS database:
Monitoring Cluster Health with ksck
The kudu CLI includes a tool called ksck that can be used for gathering information about the state of a Kudu cluster, including checking its health. ksck will identify issues such as under-replicated tablets, unreachable tablet servers, or tablets without a leader.
ksck should be run from the command line as the Kudu admin user, and requires the full list of master addresses to be specified:
$ sudo -u kudu kudu cluster ksck master-01.example.com,master-02.example.com,master-03.example.com
To see a full list of the options available with ksck, use the --help flag. If the cluster is healthy, ksck will print information about the cluster, a success message, and return a zero (success) exit status.
Master Summary UUID | Address | Status ----------------------------------+-----------------------+--------- a811c07b99394df799e6650e7310f282 | master-01.example.com | HEALTHY b579355eeeea446e998606bcb7e87844 | master-02.example.com | HEALTHY cfdcc8592711485fad32ec4eea4fbfcd | master-02.example.com | HEALTHY Tablet Server Summary UUID | Address | Status ----------------------------------+------------------------+--------- a598f75345834133a39c6e51163245db | tserver-01.example.com | HEALTHY e05ca6b6573b4e1f9a518157c0c0c637 | tserver-02.example.com | HEALTHY e7e53a91fe704296b3a59ad304e7444a | tserver-03.example.com | HEALTHY Version Summary Version | Servers ---------+------------------------- 1.7.1 | all 6 server(s) checked Summary by table Name | RF | Status | Total Tablets | Healthy | Recovering | Under-replicated | Unavailable ----------+----+---------+---------------+---------+------------+------------------+------------- my_table | 3 | HEALTHY | 8 | 8 | 0 | 0 | 0 | Total Count ----------------+------------- Masters | 3 Tablet Servers | 3 Tables | 1 Tablets | 8 Replicas | 24 OK
If the cluster is unhealthy, for instance if a tablet server process has stopped, ksck will report the issue(s) and return a non-zero exit status, as shown in the abbreviated snippet of ksck output below:
Tablet Server Summary UUID | Address | Status ----------------------------------+------------------------+------------- a598f75345834133a39c6e51163245db | tserver-01.example.com | HEALTHY e05ca6b6573b4e1f9a518157c0c0c637 | tserver-02.example.com | HEALTHY e7e53a91fe704296b3a59ad304e7444a | tserver-03.example.com | UNAVAILABLE Error from 127.0.0.1:7150: Network error: could not get status from server: Client connection negotiation failed: client connection to 127.0.0.1:7150: connect: Connection refused (error 61) (UNAVAILABLE) ... (full output elided) ------------------ Errors: ------------------ Network error: error fetching info from tablet servers: failed to gather info for all tablet servers: 1 of 3 had errors Corruption: table consistency check error: 1 out of 1 table(s) are not healthy FAILED Runtime error: ksck discovered errors
To verify data integrity, the optional --checksum_scan flag can be set, which will ensure the cluster has consistent data by scanning each tablet replica and comparing results. The --tables or --tablets flags can be used to limit the scope of the checksum scan to specific tables or tablets, respectively.
For example, checking data integrity on the my_table table can be done with the following command:
$ sudo -u kudu kudu cluster ksck --checksum_scan --tables my_table master-01.example.com,master-02.example.com,master-03.example.com
By default, ksck will attempt to use a snapshot scan of the table, so the checksum scan can be done while writes continue.
Finally, ksck also supports output in JSON format using the --ksck_format flag. JSON output contains the same information as the plain text output, but in a format that can be used by other tools. See kudu cluster ksck --help for more information.
Changing Directory Configuration
For higher read parallelism and larger volumes of storage per server, you may want to configure servers to store data in multiple directories on different devices. Once a server is started, you must go through the following steps to change the directory configuration.
You can add or remove data directories to an existing master or tablet server via the kudu fs update_dirs tool. Data is striped across data directories, and when a new data directory is added, new data will be striped across the union of the old and new directories.
- The tool can only run while the server is offline, so establish a maintenance window to update the server. The tool itself runs quickly, so this offline window should be brief, and as
such, only the server to update needs to be offline.
However, if the server is offline for too long (see the follower_unavailable_considered_failed_sec flag), the tablet replicas on it may be evicted from their Raft groups. To avoid this, it may be desirable to bring the entire cluster offline while performing the update.
- Run the tool with the desired directory configuration flags. For example, if a cluster was set up with --fs_wal_dir=/wals, ‑‑fs_metadata_dir=/meta, and ‑‑fs_data_dirs=/data/1,/data/2,/data/3, and /data/3 is to be removed (e.g. due to a disk
error), run the command:
$ sudo -u kudu kudu fs update_dirs --force --fs_wal_dir=/wals --fs_metadata_dir=/meta --fs_data_dirs=/data/1,/data/2
- Modify the values of the fs_data_dirs flags for the updated sever. If using Cloudera Manager, make sure to only update the configurations of the updated server, rather than of the entire Kudu service.
- Once complete, the server process can be started. When Kudu is installed using system packages, service is typically used:
$ sudo service kudu-tserver start
Recovering from Disk Failure
Kudu nodes can only survive failures of disks on which certain Kudu directories are mounted. For more information about the different Kudu directory types, see the section on Directory Configurations.
Node Type | Kudu Directory Type | Kudu Releases that Crash on Disk Failure |
---|---|---|
Master | All | All |
Tablet Server | Directory containing WALs | All |
Tablet Server | Directory containing tablet metadata | All |
Tablet Server | Directory containing data blocks only | Pre-1.6.0 |
When a disk failure occurs that does not lead to a crash, Kudu will stop using the affected directory, shut down tablets with blocks on the affected directories, and automatically re-replicate the affected tablets to other tablet servers. The affected server will remain alive and print messages to the log indicating the disk failure, for example:
E1205 19:06:24.163748 27115 data_dirs.cc:1011] Directory /data/8/kudu/data marked as failed E1205 19:06:30.324795 27064 log_block_manager.cc:1822] Not using report from /data/8/kudu/data: IO error: Could not open container 0a6283cab82d4e75848f49772d2638fe: /data/8/kudu/data/0a6283cab82d4e75848f49772d2638fe.metadata: Read-only file system (error 30) E1205 19:06:33.564638 27220 ts_tablet_manager.cc:946] T 4957808439314e0d97795c1394348d80 P 70f7ee61ead54b1885d819f354eb3405: aborting tablet bootstrap: tablet has data in a failed directory
While in this state, the affected node will avoid using the failed disk, leading to lower storage volume and reduced read parallelism. The administrator should schedule a brief window to Changing Directory Configuration to exclude the failed disk.
When the disk is repaired, remounted, and ready to be reused by Kudu, take the following steps:
- Make sure that the Kudu portion of the disk is completely empty.
- Stop the tablet server.
- Run the update_dirs tool. For example, to add /data/3, run the following:
$ sudo -u kudu kudu fs update_dirs --force --fs_wal_dir=/wals --fs_data_dirs=/data/1,/data/2,/data/3
- Start the tablet server.
- Run ksck to verify cluster health. For example:
$ sudo -u kudu kudu cluster ksck master-01.example.com
Recovering from Full Disks
By default, Kudu reserves a small amount of space, 1% by capacity, in its directories. Kudu considers a disk full if there is less free space available than the reservation. Kudu nodes can only tolerate running out of space on disks on which certain Kudu directories are mounted. For more information about the different Kudu directory types, see Directory Configurations. The table below describes this behavior for each type of directory. The behavior is uniform across masters and tablet servers.
Kudu Directory Type | Crash on Full Disk? |
---|---|
Directory containing WALs | Yes |
Directory containing tablet metadata | Yes |
Directory containing data blocks only | No (see below) |
Prior to Kudu 1.7.0, Kudu stripes tablet data across all directories, and will avoid writing data to full directories. Kudu will crash if all data directories are full.
In 1.7.0 and later, new tablets are assigned a disk group consisting of data directories. The number of data directories are as specified by the -fs_target_data_dirs_per_tablet flag with the default being 3. If Kudu is not configured with enough data directories for a full disk group, all data directories are used. When a data directory is full, Kudu will stop writing new data to it and each tablet that uses that data directory will write new data to other data directories within its group. If all data directories for a tablet are full, Kudu will crash. Periodically, Kudu will check if full data directories are still full, and will resume writing to those data directories if space has become available.
If Kudu does crash because its data directories are full, freeing space on the full directories will allow the affected daemon to restart and resume writing. Note that it may be possible for Kudu to free some space by running:
$ sudo -u kudu kudu fs check --repair
However, the above command may also fail if there is too little space left.
It’s also possible to allocate additional data directories to Kudu in order to increase the overall amount of storage available. See the documentation on updating a node’s directory configuration for more information. Note that existing tablets will not use new data directories, so adding a new data directory does not resolve issues with full disks.
Bringing a Tablet That Has Lost a Majority of Replicas Back Online
Suppose a tablet has lost a majority of its replicas. The first step in diagnosing and fixing the problem is to examine the tablet's state using ksck:
$ sudo -u kudu kudu cluster ksck --tablets=e822cab6c0584bc0858219d1539a17e6 master-00,master-01,master-02 Connected to the Master Fetched info from all 5 Tablet Servers Tablet e822cab6c0584bc0858219d1539a17e6 of table 'my_table' is unavailable: 2 replica(s) not RUNNING 638a20403e3e4ae3b55d4d07d920e6de (tserver-00:7150): RUNNING 9a56fa85a38a4edc99c6229cba68aeaa (tserver-01:7150): bad state State: FAILED Data state: TABLET_DATA_READY Last status: <failure message> c311fef7708a4cf9bb11a3e4cbcaab8c (tserver-02:7150): bad state State: FAILED Data state: TABLET_DATA_READY Last status: <failure message>
This output shows that, for tablet e822cab6c0584bc0858219d1539a17e6, the two tablet replicas on tserver-01 and tserver-02 failed. The remaining replica is not the leader, so the leader replica failed as well. This means the chance of data loss is higher since the remaining replica on tserver-00 may have been lagging. In general, to accept the potential data loss and restore the tablet from the remaining replicas, divide the tablet replicas into two groups:
- Healthy replicas: Those in RUNNING state as reported by ksck
- Unhealthy replicas
For example, in the above ksck output, the replica on tablet server tserver-00 is healthy while the replicas on tserver-01 and tserver-02 are unhealthy. On each tablet server with a healthy replica, alter the consensus configuration to remove unhealthy replicas. In the typical case of 1 out of 3 surviving replicas, there will be only one healthy replica, so the consensus configuration will be rewritten to include only the healthy replica.
$ sudo -u kudu kudu remote_replica unsafe_change_config tserver-00:7150 <tablet-id> <tserver-00-uuid>
where <tablet-id> is e822cab6c0584bc0858219d1539a17e6 and <tserver-00-uuid> is the uuid of tserver-00, 638a20403e3e4ae3b55d4d07d920e6de.
Once the healthy replicas' consensus configurations have been forced to exclude the unhealthy replicas, the healthy replicas will be able to elect a leader. The tablet will become available for writes though it will still be under-replicated. Shortly after the tablet becomes available, the leader master will notice that it is under-replicated, and will cause the tablet to re-replicate until the proper replication factor is restored. The unhealthy replicas will be tombstoned by the master, causing their remaining data to be deleted.
Rebuilding a Kudu Filesystem Layout
In the event that critical files are lost, i.e. WALs or tablet-specific metadata, all Kudu directories on the server must be deleted and rebuilt to ensure correctness. Doing so will destroy the copy of the data for each tablet replica hosted on the local server. Kudu will automatically re-replicate tablet replicas removed in this way, provided the replication factor is at least three and all other servers are online and healthy.
- The first step to rebuilding a server with a new directory configuration is emptying all of the server’s existing directories. For example, if a tablet server is configured with
--fs_wal_dir=/data/0/kudu-tserver-wal, --fs_metadata_dir=/data/0/kudu-tserver-meta, and --fs_data_dirs=/data/1/kudu-tserver,/data/2/kudu-tserver, the following commands will remove the WAL directory’s and data directories' contents:
# Note: this will delete all of the data from the local tablet server. $ rm -rf /data/0/kudu-tserver-wal/* /data/0/kudu-tserver-meta/* /data/1/kudu-tserver/* /data/2/kudu-tserver/*
- If using Cloudera Manager, update the configurations for the rebuilt server to include only the desired directories. Make sure to only update the configurations of servers to which changes were applied, rather than of the entire Kudu service.
- After directories are deleted, the server process can be started with the new directory configuration. The appropriate sub-directories will be created by Kudu upon starting up.
Physical Backups of an Entire Node
- Stop all Kudu processes in the cluster. This prevents the tablets on the backed up node from being rereplicated elsewhere unnecessarily.
- If creating a backup, make a copy of the WAL, metadata, and data directories on each node to be backed up. It is important that this copy preserve all file attributes as well as sparseness.
- If restoring from a backup, delete the existing WAL, metadata, and data directories, then restore the backup via move or copy. As with creating a backup, it is important that the restore preserve all file attributes and sparseness.
- Start all Kudu processes in the cluster.
Scaling Storage on Kudu Master and Tablet Servers in the Cloud
If you find that the size of your Kudu cloud deployment has exceeded previous expectations, or you simply wish to allocate more storage to Kudu, use the following set of high-level steps as a guide to increasing storage on your Kudu master or tablet server hosts. You must work with your cluster's Hadoop administrators and the system administrators to complete this process. Replace the file paths in the following steps to those relevant to your setup.
- Run a consistency check on the cluster hosts. For instructions, see Monitoring Cluster Health with ksck.
- On all Kudu hosts, create a new file system with the storage capacity you require. For example, /new/data/dir.
- Shutdown cluster services. For a cluster managed by Cloudera Manager cluster, see Starting and Stopping a Cluster.
- Copy the contents of your existing data directory, /current/data/dir, to the new filesystem at /new/data/dir.
- Move your existing data directory, /current/data/dir, to a separate temporary location such as /tmp/data/dir.
- Create a new /current/data/dir directory.
mkdir /current/data/dir
- Mount /new/data/dir as /current/data/dir. Make changes to fstab as needed.
- Perform steps 4-7 on all Kudu hosts.
- Startup cluster services. For a cluster managed by Cloudera Manager cluster, see Starting and Stopping a Cluster.
- Run a consistency check on the cluster hosts. For instructions, see Monitoring Cluster Health with ksck.
- After 10 days, if everything is in working order on all the hosts, get approval from the Hadoop administrators to remove the /backup/data/dir directory.
Migrating Kudu Data from One Directory to Another on the Same Host
Take the following steps to move the entire Kudu data from one directory to another.
- Stop the Kudu service.
- Modify the directory configurations for the Master/Server instances.
- Move the existing data from the old directory, to the new one.
- Make sure the file/directory ownership is set to the kudu user.
- Restart the Kudu service.
- Run ksck and verify for the healthy status.
Minimizing Cluster Disruption During Temporary Planned Downtime of a Single Tablet Server
If a single tablet server is brought down temporarily in a healthy cluster, all tablets will remain available and clients will function as normal, after potential short delays due to leader elections. However, if the downtime lasts for more than --follower_unavailable_considered_failed_sec (default 300) seconds, the tablet replicas on the down tablet server will be replaced by new replicas on available tablet servers. This will cause stress on the cluster as tablets re-replicate and, if the downtime lasts long enough, significant reduction in the number of replicas on the down tablet server. This may require the rebalancer to fix.
To work around this, increase --follower_unavailable_considered_failed_sec on all tablet servers so the amount of time before re-replication will start is longer than the expected downtime of the tablet server, including the time it takes the tablet server to restart and bootstrap its tablet replicas. To do this, run the following command on each tablet server:
$ sudo -u kudu kudu tserver set_flag <tserver_address> follower_unavailable_considered_failed_sec <num_seconds>
where <num_seconds> is the number of seconds that will encompass the downtime. Once the downtime is finished, reset the flag to its original value.
$ sudo -u kudu kudu tserver set_flag <tserver_address> follower_unavailable_considered_failed_sec <original_value>
In Kudu versions 1.7 and lower, the --force flag must be provided in the above commands.
Running Tablet Rebalancing Tool
The kudu CLI contains a rebalancing tool that can be used to rebalance tablet replicas among tablet servers. For each table, the tool attempts to balance the number of replicas per tablet server. It will also, without unbalancing any table, attempt to even out the number of replicas per tablet server across the cluster as a whole.
The rebalancing tool should be run as the Kudu admin user, specifying all master addresses:
sudo -u kudu kudu cluster rebalance master-01.example.com,master-02.example.com,master-03.example.com
When run, the rebalancer will report on the initial tablet replica distribution in the cluster, log the replicas it moves, and print a final summary of the distribution when it terminates:
Per-server replica distribution summary: Statistic | Value -----------------------+----------- Minimum Replica Count | 0 Maximum Replica Count | 24 Average Replica Count | 14.400000 Per-table replica distribution summary: Replica Skew | Value --------------+---------- Minimum | 8 Maximum | 8 Average | 8.000000 I0613 14:18:49.905897 3002065792 rebalancer.cc:779] tablet e7ee9ade95b342a7a94649b7862b345d: 206a51de1486402bbb214b5ce97a633c -> 3b4d9266ac8c45ff9a5d4d7c3e1cb326 move scheduled I0613 14:18:49.917578 3002065792 rebalancer.cc:779] tablet 5f03944529f44626a0d6ec8b1edc566e: 6e64c4165b864cbab0e67ccd82091d60 -> ba8c22ab030346b4baa289d6d11d0809 move scheduled I0613 14:18:49.928683 3002065792 rebalancer.cc:779] tablet 9373fee3bfe74cec9054737371a3b15d: fab382adf72c480984c6cc868fdd5f0e -> 3b4d9266ac8c45ff9a5d4d7c3e1cb326 move scheduled ... (full output elided) I0613 14:19:01.162802 3002065792 rebalancer.cc:842] tablet f4c046f18b174cc2974c65ac0bf52767: 206a51de1486402bbb214b5ce97a633c -> 3b4d9266ac8c45ff9a5d4d7c3e1cb326 move completed: OK rebalancing is complete: cluster is balanced (moved 28 replicas) Per-server replica distribution summary: Statistic | Value -----------------------+----------- Minimum Replica Count | 14 Maximum Replica Count | 15 Average Replica Count | 14.400000 Per-table replica distribution summary: Replica Skew | Value --------------+---------- Minimum | 1 Maximum | 1 Average | 1.000000
If more details are needed in addition to the replica distribution summary, use the --output_replica_distribution_details flag. If added, the flag makes the tool print per-table and per-tablet server replica distribution statistics as well.
Use the --report_only flag to get a report on table-wide and cluster-wide replica distribution statistics without starting any rebalancing activity.
The rebalancer can also be restricted to run on a subset of the tables by supplying the --tables flag. Note that, when running on a subset of tables, the tool will not attempt to balance the cluster as a whole.
The length of time rebalancing is run for can be controlled with the flag --max_run_time_sec. By default, the rebalancer will run until the cluster is balanced. To control the amount of resources devoted to rebalancing, modify the flag --max_moves_per_server. See kudu cluster rebalance --help for more.
It's safe to stop the rebalancer tool at any time. When restarted, the rebalancer will continue rebalancing the cluster.
The rebalancer tool requires all registered tablet servers to be up and running to proceed with the rebalancing process in order to avoid possible conflicts and races with the automatic re-replication and to keep replica placement optimal for current configuration of the cluster. If a tablet server becomes unavailable during the rebalancing session, the rebalancer will exit. As noted above, it's safe to restart the rebalancer after resolving the issue with unavailable tablet servers.
The rebalancing tool can rebalance Kudu clusters running older versions as well, with some restrictions. Consult the following table for more information. In the table, "RF" stands for "replication factor".
Version Range | Rebalances RF = 1 Tables? | Rebalances RF > 1 Tables? |
---|---|---|
v < 1.4.0 | No | No |
1.4.0 <= v < 1.7.1 | No | Yes |
v >= 1.7.1 | Yes | Yes |
If the rebalancer is running against a cluster where rebalancing replication factor one tables is not supported, it will rebalance all the other tables and the cluster as if those singly-replicated tables did not exist.
Decommissioning or Permanently Removing a Tablet Server From a Cluster
Kudu does not currently support an automated way to remove a tablet server from a cluster permanently. Use the following steps to manually remove a tablet server:
- Ensure the cluster is in good health using ksck. See Checking Cluster Health with ksck.
- If the tablet server contains any replicas of tables with replication factor 1, these replicas must be manually moved off the tablet server prior to shutting it down. Use the kudu tablet change_config move_replica tool for this.
- Shut down the tablet server. After -follower_unavailable_considered_failed_sec, which defaults to 5 minutes, Kudu will begin to re-replicate the tablet server’s replicas to other servers. Wait until the process is finished. Progress can be monitored using ksck.
- Once all the copies are complete, ksck will continue to report the tablet server as unavailable. The cluster will otherwise operate fine without the tablet server. To completely remove it from the cluster so ksck shows the cluster as completely healthy, restart the masters. In the case of a single master, this will cause cluster downtime. With multi-master, restart the masters in sequence to avoid cluster downtime.