Scaling the Data Lake through the CDP CLI
You can scale a Data Lake from light to medium duty through the CDP CLI.
Required role: EnvironmentAdmin or Owner of the environment
- Stop all of the attached Data Hub clusters that can be stopped, to make sure that there are no changes to HMS metadata during the scaling operation. For any cluster that cannot be stopped, stop all of the services on the Data Hub through the Cloudera Manager UI.
- Verify that the DATALAKE_ADMIN_ROLE, RANGER_AUDIT_ROLE, and LOG_ROLE have read/write permissions to the backup location. See the Data Lake backup and restore documentation for more information on these permissions. LOG_ROLE is specific to Data Lake restore.
- To trigger scaling from the CDP CLI, run the
cdp datalake resize-datalake
command. For example:cdp datalake resize-datalake --datalake-name <mydatalake> --target-size MEDIUM_DUTY_HA
Option Description –datalake-name
Name or CRN of the Data Lake that you want to upscale. --target-size
Currently only MEDIUM_DUTY_HA is supported. - Monitor the Event History. The scaling operation is finished when the Data Hub clusters have been automatically refreshed, which happens after the original light duty Data Lake has been deleted. Check the Event History to verify that the Data Hubs have been refreshed.
- When the scaling operation is finished, if you are upscaling a RAZ-enabled Data
Lake, manually restore the Data Lake from the
backup taken by the scaling process. To restore, you will need to add the restore_to_raz
policy in Ranger and then run the
cdp datalake restore-datalake
command in the CDP CLI.