Backing up CDW using the backup-cluster command

Use the backup-cluster command to back up the configuration and settings of all the Database Catalog, Virtual Warehouses, and Data Visualization instances within your Cloudera Data Warehouse (CDW) environment.

  1. SSH into a host on your cluster from which you can access the CDP Private Cloud Data Services cluster.
  2. Run the following command to back up the cluster:
    cdp dw backup-cluster --cluster-id [***CDW-CLUSTER-ID***] [--cli-input-json <value>] [--generate-cli-skeleton]

    Replace [***CDW-CLUSTER-ID***] with the actual cluster ID of your environment. The cluster ID is a unique CDW environment identifier.

    [--cli-input-json <value>] and [--generate-cli-skeleton] are optional parameters.

    To specify the –cli-input-json parameter, you must obtain the skeleton of the JSON file by running the following command:
    cdp dw backup-cluster --generate-cli-skeleton
    The output of this command is a JSON object as follows:
    {
        "clusterId": ""
    }
    You can now use the JSON string as a parameter for the --cli-input-json command option as follows:
    cdp dw backup-cluster --cli-input-json '{"clusterId":"[***CDW-CLUSTER-ID***]"}'
    The output contains the following information:
    • clusterId: The ID of the cluster, a unique identifier of the CDW environment.
    • operationId: The ID of the backup operation. You can use the operation ID to query the backup execution details using the CLI.
    • timestamp: The date of the creation.
    • data: The backup data and configuration.
    • md5: The md5 hash of the encoded data. In case the data and its hash are lost, the cluster objects cannot be restored automatically.
  3. Save the output in a file.
    You need this information during the restoration process.
The Hue backup is stored in the following location:
hdfs://cdw-backups/[***TIMESTAMP***]_[***JOB-ID***]/[***ENVIRONMENT-NAME***]/hue-backup
The CDV backup is stored in the following location:
hdfs://cdw-backups/[***TIMESTAMP***]_[***JOB-ID***]/[***DATAVIZ-INSTANCE-NAME]/viz-backup
Monitor the database backup jobs. The backup process automatically starts the Hue and Data Visualization database backup jobs that you can monitor. Make sure that the database backup jobs complete before destroying the cluster. If you delete the cluster before the jobs are completed, you cannot recover the application contents.