Resizing the Data Lake through the CDP CLI

You can resize a Data Lake from light or medium duty to medium duty or enterprise through the CDP CLI. As part of Data Lake resizing via CDP CLI, you can also resize from single-AZ to multi-AZ.

Required role: EnvironmentAdmin or Owner of the environment

Cloudera Manager configurations are not retained when a Data Lake is resized (they are lost when a new Data Lake cluster is created as part of backup and restore operation). Therefore, prior to performing a resize you should note all the custom Cloudera Manager configurations of your Data Lake and then once the resizing operation is completed, reapply them.
  1. Stop all of the attached Data Hub clusters that can be stopped, to make sure that there are no changes to HMS metadata during the resizing operation. For any cluster that cannot be stopped, stop all of the services on the Data Hub through the Cloudera Manager UI.
  2. Verify that the DATALAKE_ADMIN_ROLE, RANGER_AUDIT_ROLE, and LOG_ROLE have read/write permissions to the backup location. See the Data Lake backup and restore documentation for more information on these permissions. LOG_ROLE is specific to Data Lake restore.
  3. To trigger resizing from the CDP CLI, run the cdp datalake resize-datalake command. For example:
    cdp datalake resize-datalake --datalake-name <mydatalake> --target-size MEDIUM_DUTY_HA
    Option Description
    –datalake-name Name or CRN of the Data Lake that you want to upscale.
    --target-size MEDIUM_DUTY_HA or ENTERPRISE
    Use the cdp datalake resize-datalake command with the --multi-az flag to resize your Data Lake from single-AZ medium duty to enterprise multi-AZ:
    cdp datalake resize-datalake \
      --datalake-name <VALUE> \
      --target-size <VALUE> \
      --multi-az
    If the source Data Lake is multi-AZ, the --multi-az flag is ignored.
  4. Monitor the Event History. The resizing operation is finished when the Data Hub clusters have been automatically refreshed, which happens after the original light duty Data Lake has been deleted. Check the Event History to verify that the Data Hubs have been refreshed.
  5. RAZ-enabled Data Lakes are currently eligible for automatic restore during a resizing operation only if you are resizing:
    • An AWS Data Lake on Cloudera Runtime version 7.2.15+
    • An Azure Data Lake on Cloudera Runtime version 7.2.16+
    For older Runtime versions, the Data Lake will be automatically backed up, but you must manually restore the Data Lake after the resizing is complete. If RAZ is in use on a Runtime version that is ineligible for automatic restore, before you start the Data Lake backup, make sure that the restore_to_raz policy Ranger policy exists with access to the backup location in the cloud. See instructions for manually restoring a RAZ-enabled Data Lake here.