Back up tables
You can use the KuduBackup
Spark job to backup one or more Kudu
tables.
When you first run the job for a table, a full backup is run. Additional runs will perform
incremental backups which will only contain the rows that have changed since the initial full
backup. A new set of full backups can be forced at anytime by passing the
--forceFull
flag to the backup job.
Following are some of the common flags that you can use while taking a backup:
--kuduMasterAddresses
: This is used to specify a comma-separated addresses of Kudu masters. The default value islocalhost
.--rootPath
: The root path is used to output backup data. It accepts any Spark-compatible path.
The following code execute a
KuduBackup
job which backs up the tables
foo
and bar
to an HDFS
directory:spark-submit --class org.apache.kudu.backup.KuduBackup [***FULL PATH TO kudu-backup2_2.11-1.12.0.jar***] \ --kuduMasterAddresses [***KUDU MASTER HOSTNAME 1***]:7051,[***KUDU MASTER HOSTNAME 2***]:7051 \ --rootPath hdfs:///[***DIRECTORY TO USE FOR BACKUP***] \ impala::[***DATABASE NAME***].foo impala::[***DATABASE NAME***].bar