Backing up data in Kudu
You can back up all your data in Kudu using the kudu-backup-tools.jar Kudu backup tool.
The Kudu backup tool runs a Spark job that builds the backup data file and writes it to HDFS or AWS S3, based on what you specify. Note that if you are backing up to S3, you have to provide S3 credentials to spark-submit as described in Specifying Credentials to Access S3 from Spark
The Kudu backup tool creates a full backup of your data on the first run. Subsequently, the tool creates incremental backups.
Run the following command to start the backup
process:
spark-submit --class org.apache.kudu.backup.KuduBackup <path to kudu-backup2_2.11-1.12.0.jar> \
--kuduMasterAddresses <addresses of Kudu masters> \
--rootPath <path to store the backed up data> \
<table_name>
where
--kuduMasterAddresses
is used to specify the addresses of the Kudu masters as a comma-separated list. For example,master1-host,master-2-host,master-3-host
which are the actual hostnames of Kudu masters.--rootPath
is used to specify the path to store the backed up data. It accepts any Spark-compatible path.- Example for HDFS:
hdfs:///kudu-backups
- Example for AWS S3:
s3a://kudu-backup/
If you are backing up to S3 and see the “Exception in thread "main" java.lang.IllegalArgumentException: path must be absolute” error, ensure that S3 path ends with a forward slash (
/
).- Example for HDFS:
<table_name>
can be a table or a list of tables to be backed up.
Example:
spark-submit --class org.apache.kudu.backup.KuduBackup /opt/cloudera/parcels/CDH-7.2.1-1.cdh7.2.1.p0.4041380/lib/kudu/kudu-backup2_2.11.jar \
--kuduMasterAddresses cluster-1.cluster_name.root.hwx.site,cluster-2.cluster_name.root.hwx.site \
--rootPath hdfs:///kudu-backups \
my_table