Creating a Data Lake with a custom image

If necessary, you can customize a pre-warmed Data Lake image for compliance or security reasons, or to deploy certain software on a machine image. You can then use the CDP CLI to create the Data Lake with the custom image.

Required role: EnvironmentCreator can create a shared resource and then assign users to it. SharedResourceUser or Owner of the shared resource can use the resource. The Owner of the shared resource can delete it.

The typical method of creating a Data Lake through the CDP CLI involves specifying a Runtime version to be used with the Data Lake. For example:
cdp datalake create-aws-datalake --datalake-name <NAME> --environment-name <ENVNAME> --cloud-provider-configuration <CONFIG> --runtime 7.2.10
This method of creating a Data Lake picks up the latest pre-warmed image from the cdp-default image catalog for the specified version of Runtime.

Alternatively, you can specify that the Data Lake uses a custom image. You might require a custom Data Lake image for compliance or security reasons, or for deploying monitoring tools or software. You might also want to specify a custom image if you need to use a default image with a specific Runtime hotfix applied, rather than simply specifying the latest major Runtime version.

To create a Data Lake with a custom image, prepare your custom image by modifying an official Cloudera pre-warmed image, which you can find under Environments > Shared Resources > Image Catalogs > cdp-default.

Any pre-warmed images that you customize must be included in an image catalog JSON, which you can register from the Shared Resources tab.

To specify a custom Data Lake image when you create a Data Lake, run the command to create a Data Lake for a given cloud provider and include the --image parameter, where you give the catalog name and the custom image ID:

--image catalogName="<image-catalog-name>",id="<image-UUID>"

For example, the command to create an AWS Data Lake with a custom image would be similar to:

cdp datalake create-aws-datalake --datalake-name <NAME> --environment-name <ENVNAME> --cloud-provider-configuration <CONFIG> --image catalogName="<image-catalog-name>",id="<image-UUID>"

If you do not specify a catalog name using the CatalogName parameter, the catalog name defaults to the cdp-default image catalog.