Creating an AWS environment with a medium duty data lake using the CLI

You can use the CDP CLI to create an AWS environment with a medium duty data lake.

Required role: EnvironmentCreator

Before you use the CDP CLI, run the following command to verify that your environment is pointing to the correct profile:

cdp --profile {PROFILE}

As a sanity check, run the following command to verify that your environment name is not already taken:

environments describe-environment --environment-name {ENVNAME}
  1. Create a new environment:
    cdp environments create-aws-environment --cli-input-json file://{ENV_FILE_PATH}
  2. To set the IDBroker mappings, run the following command:
    cdp environments set-id-broker-mappings --environment-name "$ENVNAME" --data-access-role "$DATAACCESSROLE" --baseline-role "$BASELINEROLE" --set-empty-mappings
  3. Run the following command to create the data lake cluster within the environment, where INSTANCEPROFILE is the instance profile for your specific account, and BUCKET is the path of a valid S3 location to store the data. This S3 path can be either the root of a bucket or a sub-folder:
    cdp datalake create-aws-datalake --datalake-name "NAME" --environment-name "ENVNAME" --cloud-provider-configuration instanceProfile="INSTANCEPROFILE",storageBucketLocation="s3://MYBUCKET" --scale MEDIUM_DUTY_HA  --runtime 7.2.7
  4. Run the following command to check the status of the Data Lake:
    cdp datalake list-datalakes --environment-name ${ENVNAME}

    You should be able to look at the list of data lakes, locate yours by ENVNAME and check the status.