Creating a Data Lake with a custom image

If necessary, you can customize a pre-warmed Data Lake image for compliance or security reasons, or to deploy certain software on a machine image. You can then use the CDP CLI to create the Data Lake with the custom image.

The typical method of creating a Data Lake through the CDP CLI involves specifying a Runtime version to be used with the Data Lake. For example:
cdp datalake create-aws-datalake --datalake-name <NAME> --environment-name <ENVNAME> --cloud-provider-configuration <CONFIG> --runtime 7.2.10
This method of creating a Data Lake picks up the latest pre-warmed image from the cdp-default image catalog for the specified version of Runtime.

Alternatively, you can specify that the Data Lake uses a custom image. You might require a custom Data Lake image for compliance or security reasons, or for deploying monitoring tools or software. You might also want to specify a custom image if you need to use a default image with a specific Runtime hotfix applied, rather than simply specifying the latest major Runtime version.

To create a Data Lake with a custom image, prepare your custom image by modifying an official Cloudera pre-warmed image, which you can find under Environments > Shared Resources > Image Catalogs > cdp-default.

Any pre-warmed images that you customize must be included in an image catalog JSON, which you can register from the Shared Resources tab.

To specify a custom Data Lake image when you create a Data Lake, run the command to create a Data Lake for a given cloud provider and include the --image parameter, where you give the catalog name and the custom image ID:

--image catalogName="<image-catalog-name>",id="<image-UUID>"

For example, the command to create an AWS Data Lake with a custom image would be similar to:

cdp datalake create-aws-datalake --datalake-name <NAME> --environment-name <ENVNAME> --cloud-provider-configuration <CONFIG> --image catalogName="<image-catalog-name>",id="<image-UUID>"

If you do not specify a catalog name using the CatalogName parameter, the catalog name defaults to the cdp-default image catalog.