Prerequisites

Before proceeding to back up and restore CDW, you must meet a number of prerequisites.

The following prerequisites are mandatory for a successful backup and restore of CDW.
  • You have not enabled the MULTI_DEFAULT_DBC entitlement.
  • Your Database Catalogs are not custom (non-default) ones.
  • CDP CLI 0.9.99 or later is installed and configured.
  • You have Cluster Administrator privileges and can access the CDW web UI.
  • You must use the same Cloudera Data Warehouse version to restore files that you used to back up those files.
    For example, using a backup file from 1.6.2-b197 (released Feb 13, 2023) for restoration will not work. The Cloudera Data Warehouse (CDW) application version 1.7.3-b12, for example, appears in the UI shown below:

The CDW application version is not the same as your cluster, Database Catalog, or Virtual Warehouse versions.

The following prerequisite is necessary if you have an Azure cluster and you choose to automatically activate the environment:

  • Your Azure cluster runs CDW application version 1.6.3-b319 (released May 5, 2023) or later.

    You cannot automatically activate an Azure cluster that runs CDW application version 1.6.2-b197 (released Feb 13, 2023) or earlier.

The following prerequisites are necessary if you choose to manually activate the environment.

  • The AWS CLI or Azure CLI is installed and configured.
  • The kubectl (or k9s equivalent) is installed and configured.

A CDW cluster is up and running with one Database Catalog and one or more Hive or Impala Virtual Warehouses.

Finding the version of your CDW environment

In Cloudera Data Warehouse, select your environment, click Edit. The Environment Details includes the version.

Importance of bringing down the cluster

Backing up and restoring CDW requires bringing down the cluster to ensure successful cluster restoration. During downtime, CDW, you must prevent end-users from accessing the cluster. If downtime is not feasible due to your operational model, you can use a workaround that disables end user access instead of bringing down the cluster.

You lose any manual modification of the Kubernetes objects or configurations when you bring down the cluster. Modifications applied using the CDW UI and settings defined during creation are preserved.

Cleaning up old Hue history

Significant Hue history can accumulate in the database of long running clusters. Using the restore tool to restore such a large database can consume inordinate memory resources and result in failures. If your users work heavily with Hue, you need to clean up the old history from the database before backing up Hue as follows:

  1. Navigate to one of the Virtual Warehouses and click Edit.
  2. In Configurations > Hue > Configuration files select hue-safety-valve.
  3. Change the configuration as follows:
    [desktop]
    app_blacklist=search,hbase,security,pig,sqoop,spark,impala
  4. In the huebackend pod, start a shell session in the Hue container.
    kubectl exec -it huebackend-0 -c hue -n <virtual warehouse id> /bin/bash
  5. Run the cleanup tool using the following command:
    cd /opt/hive
    DESKTOP_DEBUG=True ./build/env/bin/hue desktop_document_cleanup --keep-days 60

    The command should succeed without an error. The Hue cleanup affects only the history. No saved queries will be lost.