Registering an AWS environment from the CDP CLI
Use the CDP CLI to register your AWS environment.
Unlike in the Cloudera web interface, environment creation in the CDP CLI is a three-step process: creating the environment, setting DBroker mappings, and creating data lakes. Follow these steps to create an AWS environment in Cloudera.
-
Register your AWS environment in Cloudera with the
create-aws-environmentcommand and required input parameters.cdp environments create-aws-environment --custom-docker-registry '{c-dock}' --cli-input-json '{ "environmentName": "test-env", "description": "Test AWS environment", "credentialName": "test-aws-crd", "region": "us-west2", "publicKey": "ssh-rsa AAAAB3NzaZ1yc2EAAAADAQABAAABAQDwCI/wmQzbNn9YcA8vdU+Ot41IIUWJfOfiDrUuNcULOQL6ke5qcEKuboXzbLxV0YmQcPFvswbM5S4FlHjy2VrJ5spyGhQajFEm9+PgrsybgzHkkssziX0zRq7U4BVD68kSn6CuAHj9L4wx8WBwefMzkw7uO1CkfifIp8UE6ZcKKKwe2fLR6ErDaN9jQxIWhTPEiFjIhItPHrnOcfGKY/p6OlpDDUOuMRiFZh7qMzfgvWI+UdN/qjnTlc/M53JftK6GJqK6osN+j7fCwKEnPwWC/gmy8El7ZMHlIENxDut6X0qj9Okc/JMmG0ebkSZAEbhgNOBNLZYdP0oeQGCXjqdv", "enableTunnel": true, "usePublicIp": true, "existingNetworkParams": { "networkName": "eng-private", "subnetNames": [ "private-us-west2" ], "sharedProjectId": "dev-project" }, "logStorage": { "storageLocationBase": "gs://logs", "serviceAccountEmail": "logger@dev-project.iam.gserviceaccount.com" } }'Parameter Description custom-docker-registry The CRN of the desired custom docker registry for data services to be used. environmentName Provide a name for your environment. credentialName Provide the name of the credential created earlier. region Specify the region where your existing VPC network is located. For example, ”us-west2” is a valid region. publicKey Paste your SSH public key. existingNetworkParams Provide a JSON specifying the following:{ "networkName": "string", "subnetNames": ["string", ...], "sharedProjectId": "string" }Replace the values with the actual VPC network name, one or more subnet names, and shared project ID.
The
sharedProjectIdvalue needs to be set as follows:-
For a shared VPC, set it to the AWS host project ID.
-
For a non-shared VPC, set it to the AWS project ID of the project where Cloudera is being deployed.
enableTunnel Enable and disable the Cluster Connectivity Manager. The default value is “true” (enabled). Set it to “false” to disable it. If disabled, you must specify two security groups in your JSON definition:"securityAccess": { "securityGroupIdForKnox": "string", "defaultSecurityGroupId": "string" }usePublicIp Set this to “true” to create public IPs or “false” to use private IPs. logStorage Provide a JSON definition specifying your configuration for the cluster and audit logs:
The{ "storageLocationBase": "string", "serviceAccountEmail": "string" }storageLocationBasemust follow this format:gs://my-bucket-name. -
-
Once your environment is running, set the IDBroker Mappings with the
set-id-broker-mappingscommand.cdp environments set-id-broker-mappings \ --environment-name test-env \ --data-access-role dl-admin@dev-project.iam.gserviceaccount.com \ --ranger-audit-role ranger-audit@dev-project.iam.gserviceaccount.com \ --mappings '[{"accessorCrn": "crn:altus:iam:us-west-1:45ca3068-42a6-4227-8394-13a4493e2ac0:user:430c534d-8a19-4d9e-963d-8af377d16963", "role": "data-science@dev-project.iam.gserviceaccount.com"},{"accessorCrn":"crn:altus:iam:us-west-1:45ca3068-42a6-4227-8394-13a4493e2ac0:machineUser:mfox-aws-idbmms-test-mu/2cbca867-647b-44b9-8e41-47a01dea6c19","role":"data-eng@dev-project.iam.gserviceaccount.com"}]'Parameter Description datalakeName Provide a name for your Data Lake. environmentName Provide the name of the environment created earlier. scale Provide Data Lake scale. It must be one of:- LIGHT_DUTY or
- ENTERPRISE
cloudProviderConfiguration Provide the name of the data storage bucket and the email of the IDBroker service account. -
Create the data lake clusters within the environment with the
create-aws-datalakecommand.cdp datalake create-aws-datalake --datalake-name "NAME" --environment-name "ENVNAME" --cloud-provider-configuration instanceProfile="INSTANCEPROFILE",storageBucketLocation="s3://MYBUCKET" --scale MEDIUM_DUTY_HA --runtime 7.2.7Parameter Description environment-name Specify the name of the environment created earlier. data-access-role Specify the email address of the data lake admin service account created earlier. ranger-audit-role Specify the email address of the Ranger audit service account created earlier. mappings Map Cloudera users or groups to the GCP service accounts created earlier. Use the following syntax:[ { "accessorCrn": "string", "role": "string" } ... ]You can obtain a user or group CRN from the Management Console > User Management by navigating to the details of a specific user or group.
The role should be specified as a service account email.
-
Verify that your new environment is running:
cdp environments list-environments -
Verify the status of the data lake:
cdp datalake list-datalakes --environment-name ${ENVNAME} -
Sync the IDBroker Mappings:
cdp environments sync-id-broker-mappings --environment-name demo3 -
Verify the sync status:
cdp environments get-id-broker-mappings-sync-status --environment-name demo3
Once your environment is running:
- You must assign roles to users and groups to allow them access to the environment and perform user sync. For steps, refer to Enabling admin and user access to environments.
- You must onboard your users and/or groups for cloud storage. For steps, refer to Onboarding Cloudera users and groups for cloud storage.
- You must create Ranger policies to determine which users have access to which databases and tables. For instructions on how to access your data lake, refer to Accessing Data Lake services.
- You can use the
update-custom-docker-registrycommand to update/change the custom docker registry on an existing environment. After the update, newly created experiences will use the updated registry. - You can also use the
update-custom-docker-registrycommand for existing experiences to update the repo details within the environment if the registry query/check from the environment service side is dynamic on the experience service side.
