Backing up tables

You can use the KuduBackup Spark job to backup one or more Kudu tables. When you first run the job for a table, a full backup is run. Additional runs will perform incremental backups which will only contain the rows that have changed since the initial full backup. A new set of full backups can be forced at anytime by passing the --forceFull flag to the backup job.

Following are some of the common flags that you can use while taking a backup:
  • --rootPath: The root path is used to output backup data. It accepts any Spark-compatible path.
  • --kuduMasterAddresses: This is used to specify a comma-separated addresses of Kudu masters. The default value is localhost.
  • <table>...​: Is used to indicate a list of tables that you want to back up.
Following is an example of a KuduBackup job execution which backs up the tables foo and bar to the HDFS directory kudu-backups:
spark-submit --class org.apache.kudu.backup.KuduBackup kudu-backup2_2.11-1.12.0.jar \
  --kuduMasterAddresses master1-host,master-2-host,master-3-host \
  --rootPath hdfs:///kudu-backups \
  foo bar