Restoring Kudu data into the target CDP cluster

Once you have backed up your data in Kudu, you can copy the data to the target CDP cluster and then restore it using the Kudu backup tool.

  • If you applied any custom Kudu configurations in your old clusters, then you manually have to apply those configurations in your target cluster.
  • Copy the backed up Kudu data to the target CDP cluster.
  1. Run the following command to restore the backup on the target cluster:
    spark-submit --class org.apache.kudu.backup.KuduRestore <path to kudu-backup2_2.11-1.12.0.jar> \
    --kuduMasterAddresses <addresses of Kudu masters> \
    --rootPath <path to the stored backed up data> \
    • --kuduMasterAddresses is used to specify the addresses of the Kudu masters as a comma-separated list. For example, master1-host,master-2-host,master-3-host which are the actual hostnames of Kudu masters.
    • --rootPath is used to specify the path at which you stored the backed up data.. It accepts any Spark-compatible path.
      • Example for HDFS: hdfs:///kudu-backups
      • Example for AWS S3: s3a://kudu-backup/

      If you are backed up to S3 and see the “Exception in thread "main" java.lang.IllegalArgumentException: path must be absolute” error, ensure that S3 path ends with a forward slash (/).

    • <table_name> can be a table or a list of tables to be backed up.
    • Optional: --tableSuffix, if set, adds suffices to the restored table names. It can only be used when the createTables property is true.
    • Optional: --timestampMs is a UNIX timestamp in milliseconds that defined the latest time to use when selecting restore candidates. Its default value is System.currentTimeMillis().
    sudo -u hdfs spark-submit --class org.apache.kudu.backup.KuduRestore /opt/cloudera/parcels/CDH-7.2.0-1.cdh7.2.0.p0.3758356/lib/kudu/kudu-backup2_2.11.jar \
    --kuduMasterAddresses \
    --rootPath hdfs:///kudu/kudu-backups \
  2. Restart the Kudu service in Cloudera Manager.