Backing up and Recovering Apache KuduPDF version

Back up tables

You can use the KuduBackup Spark job to backup one or more Kudu tables.

When you first run the job for a table, a full backup is run. Additional runs will perform incremental backups which will only contain the rows that have changed since the initial full backup. A new set of full backups can be forced at anytime by passing the --forceFull flag to the backup job.

Following are some of the common flags that you can use while taking a backup:
  • --kuduMasterAddresses: This is used to specify a comma-separated addresses of Kudu masters. The default value is localhost.
  • --rootPath: The root path is used to output backup data. It accepts any Spark-compatible path.
The following code execute a KuduBackup job which backs up the tables foo and bar to an HDFS directory:
spark-submit --class org.apache.kudu.backup.KuduBackup [***FULL PATH TO kudu-backup2_2.11-1.12.0.jar***] \
  --kuduMasterAddresses [***KUDU MASTER HOSTNAME 1***]:7051,[***KUDU MASTER HOSTNAME 2***]:7051 \
  --rootPath hdfs:///[***DIRECTORY TO USE FOR BACKUP***] \
   impala::[***DATABASE NAME***].foo impala::[***DATABASE NAME***].bar