Restore tables from backups

You can use the KuduRestore Spark job to restore one or more Kudu tables. For each backed up table, the KuduRestore job restores the full backup and each associated incremental backup until the full table state is restored.

Restoring the full series of full and incremental backups is possible because the backups are linked via the from_ms and to_ms fields in the backup metadata. By default the restore job will create tables with the same name as the table that was backed up. If you want to side-load the tables without affecting the existing tables, you can pass the --tableSuffix flag to append a suffix to each restored table.

Following are the common flags that are used when restoring the tables:

--rootPath: The root path to the backup data. Accepts any Spark-compatible path.
See Backup directory structure for the directory structure used in the rootPath.
--kuduMasterAddresses: Comma-separated addresses of Kudu masters. The default value is localhost.
--createTables: If set to true, the restore process creates the tables. Set it to false if the target tables already exist. The default value is true.
--tableSuffix: If set, it adds a suffix to the restored table names. Only used when createTables is true.
--timestampMs: A UNIX timestamp in milliseconds that defines the latest time to use when selecting restore candidates. The default is System.currentTimeMillis().
<table>…: A list of tables to restore.

Following is an example of a KuduRestore job execution which restores the tables foo and bar from the HDFS directory kudu-backups:

spark-submit --class org.apache.kudu.backup.KuduRestore kudu-backup2_2.11-1.12.0.jar \
  --kuduMasterAddresses master1-host,master-2-host,master-3-host \
  --rootPath hdfs:///kudu-backups \
  foo bar