You can use the CDP CLI to create an AWS environment with a medium duty data
lake.
Required role:
EnvironmentCreator
Before you use the CDP CLI, run the following command to verify that your environment
is pointing to the correct profile:
cdp --profile {PROFILE}
As a sanity check, run the following command to verify that your environment name is not
already taken:
environments describe-environment --environment-name {ENVNAME}
-
Create a new environment:
cdp environments create-aws-environment --cli-input-json file://{ENV_FILE_PATH}
- To set the IDBroker mappings, run the following command:
cdp environments set-id-broker-mappings --environment-name "$ENVNAME" --data-access-role "$DATAACCESSROLE" --baseline-role "$BASELINEROLE" --set-empty-mappings
- Run the following command to create the data lake cluster within the environment,
where INSTANCEPROFILE is the instance profile for your specific account, and BUCKET is the
path of a valid S3 location to store the data. This S3 path can be either the root of a
bucket or a sub-folder:
cdp datalake create-aws-datalake --datalake-name "NAME" --environment-name "ENVNAME" --cloud-provider-configuration instanceProfile="INSTANCEPROFILE",storageBucketLocation="s3://MYBUCKET" --scale MEDIUM_DUTY_HA --runtime 7.2.7
- Run the following command to check the status of the Data Lake:
cdp datalake list-datalakes --environment-name ${ENVNAME}
You should be able to look at the list of data lakes, locate yours by ENVNAME and check
the status.