Restoring Kudu data into the new cluster

Once you have backed up your data in Kudu, you can copy the data to the target CDP cluster and then restore it using the Kudu backup tool.

If you applied any custom Kudu configurations in your old clusters, then you manually have to apply those configurations in your target cluster.

If you have changed the value of tablet_history_max_age_sec and you plan to run incremental backups of Kudu on the target cluster, we recommend resetting tablet_history_max_age_sec to the default value of 1 week (see https://issues.apache.org/jira/browse/KUDU-2677).

Examples of commonly modified configuration flags:
  • rpc_max_message_size
  • tablet_transaction_memory
  • rpc_service_queue_length
  • raft_heartbeat_interval
  • heartbeat_interval_ms
  • memory_limit_hard_bytes
  • block_cache_capacity_mb

Once you manually applied custom configurations, restart the Kudu cluster.

  1. Copy your backed up data to the target CDP cluster in one of the following ways:
    • Using distcp:
      sudo -u hdfs hadoop distcp hdfs:///kudu/kudu-backups/* hdfs://cluster-2.cluster_name.root.hwx.site/kudu/kudu-backups/
    • Using Replication Manager. For more information, see HDFS Replication.
  2. Run the following command to restore the backup on the target cluster:
    spark-submit --class org.apache.kudu.backup.KuduRestore <path to kudu-backup2_2.11-1.12.0.jar> \
    --kuduMasterAddresses <addresses of Kudu masters> \
    --rootPath <path to the stored backed up data> \
    <table_name>
    
    where
    • --kuduMasterAddresses is used to specify the addresses of the Kudu masters as a comma-separated list. For example, master1-host,master-2-host,master-3-host which are the actual hostnames of Kudu masters.
    • --rootPath is used to specify the path at which you stored the backed up data.. It accepts any Spark-compatible path.
      • Example for HDFS: hdfs:///kudu-backups
      • Example for AWS S3: s3a://kudu-backup/

      If you are backed up to S3 and see the “Exception in thread "main" java.lang.IllegalArgumentException: path must be absolute” error, ensure that S3 path ends with a forward slash (/).

    • <table_name> can be a table or a list of tables to be backed up.
    • Optional: --tableSuffix, if set, adds suffices to the restored table names. It can only be used when the createTables property is true.
    • Optional: --timestampMs is a UNIX timestamp in milliseconds that defined the latest time to use when selecting restore candidates. Its default value is System.currentTimeMillis().
    sudo -u hdfs spark-submit --class org.apache.kudu.backup.KuduRestore /opt/cloudera/parcels/CDH-7.2.0-1.cdh7.2.0.p0.3758356/lib/kudu/kudu-backup2_2.11.jar \
    --kuduMasterAddresses cluster-1.cluster_name.root.hwx.site \
    --rootPath hdfs:///kudu/kudu-backups \
    my_table
    
  3. Restart the Kudu service in Cloudera Manager.