Creating an AWS environment with a medium duty data lake using the CLI

You can use the CDP CLI to create an AWS environment with a medium duty data lake.

Before you use the CDP CLI, run the following command to verify that your environment is pointing to the correct profile:

cdp --profile {PROFILE}

As a sanity check, run the following command to verify that your environment name is not already taken:

environments describe-environment --environment-name {ENVNAME}
  1. Create a new environment:
    cdp environments create-aws-environment --cli-input-json file://{ENV_FILE_PATH}
  2. To set the IDBroker mappings, run the following command:
    cdp environments set-id-broker-mappings --environment-name "$ENVNAME" --data-access-role "$DATAACCESSROLE" --baseline-role "$BASELINEROLE" --set-empty-mappings
  3. Run the following command to create the data lake cluster within the environment, where INSTANCEPROFILE is the instance profile for your specific account, and BUCKET is the base storage location for your data:
    cdp datalake create-aws-datalake --datalake-name "NAME" --environment-name "ENVNAME" --cloud-provider-configuration instanceProfile="INSTANCEPROFILE",storageBucketLocation="BUCKET" --scale MEDIUM_DUTY_HA  --runtime 7.2.7
  4. Run the following command to check the status of the Data Lake:
    cdp datalake list-datalakes --environment-name ${ENVNAME}

    You should be able to look at the list of data lakes, locate yours by ENVNAME and check the status.