Understanding what is restored

The restore operation restores the state of the Data Warehouse service depending on the backup method.

If you have backed up Cloudera Data Warehouse (CDW) using the Data Recovery Service (DRS) CDP CLI command dw create-backup, then the dw restore-backup command restores the state of the Data Warehouse service from the backup.

If you have backed up CDW using CDW’s CDP CLI cluster management commands (dw backup-cluster), then based on the existing state of the cluster, the restore process creates a workflow plan that decides whether to create, or skip the restore of a component. This workflow plan is returned as the response of the restore command, so you can see which components will be created, updated, or skipped during the restore.

About the restore command

CDW's restore command is as follows:
cdp dw restore-cluster --cluster-id <value> --data <value> [--cli-input-json <value>] [--generate-cli-skeleton]

The “data” field in the output of the dw backup-cluster command contains a base64-encoded zip file containing the backup data of the cluster which includes the environment activation settings, Virtual Warehouse and Cloudera Data Visualization (CDV) settings and configuration, and locations of CDV and Hue database backups on HDFS.

You can use the CDW's dw restore-cluster command in one of the following ways:
  • By passing the environment’s Cloudera resource name (crn) to activate the cluster from the backup file and restore all the entities and database contents.
  • By passing an activated environment identifier to restore all the entities and database contents to the running environment. This method is useful when you need to change activation parameters, but it requires manual reactivation.
When you run the dw restore-cluster command, CDW:
  1. Activates the environment using the settings from the backup and waits for the infrastructure to be created
  2. Creates a default Database Catalog
  3. Updates the Database Catalog configuration to apply custom configurations
  4. Starts the Hue database restore job in the database catalog namespace asynchronously
  5. Deploys the Virtual Warehouse instances
  6. Deploys the Data Visualization instances
  7. Starts the Cloudera Data Visualization (CDV) restore job in the individual namespaces asynchronously. It also restores the database associated with the CDV instance.

The restore process is designed to be an idempotent process. You can run it multiple times, if needed. If the environment is activated and healthy, you can run the restore operation multiple times to restore the Virtual Warehouse and Data Visualization objects. For every restore operation, the Hue database restore will run. This operation overwrites the Hue database contents. If a Virtual Warehouse or a Data Visualization object is not present on the cluster, but the backup file contains it, it is restored to the cluster. If such an entity is already deployed, no changes or configuration updates occur.