Step 2) Register a CDP Environment

Before you register an environment, you'll want to create specific IAM roles and policies so that CDP can operate in a secure manner. For background information, a description of what we're building and why can found here. For this quickstart, we'll use Cloudformation to set all of this up for you.

  1. Download the Cloudformation provided template here.
  2. In the AWS console, deploy the Cloudformation template:
    1. In AWS Services, search for Cloudformation.
    2. Click Create Stack.
    3. Select Template is ready and then Upload a template file.

    4. Click Choose file and select the Cloudformation template that you downloaded.
    5. Click Next.
    6. Enter a stack name. The name can be any valid name.
      Under Parameters, complete the following fields:
      • S3BucketName: Choose an unused bucket name. CDP will create the bucket for you.
      • AWSAccount: Your 12-digit AWS account ID number, which can be found here.
      • Prefix: A short prefix of your choosing, which will be added to the names of the IAM resources we'll be creating.

      For example:

      Make a note of the S3BucketName and prefix that you define. You will need them later.

    7. Click Next.
    8. At the Configure Stack Options page, click Next.
    9. At the bottom of the Review page, under Capabilities, click the checkbox next to I acknowledge that AWS Cloudformation might create IAM resources with custom names, as that is exactly what we will be doing.

    10. Click Create Stack.
  3. Still in the AWS console, create an SSH key in the region of your choice. If there is already an SSH key in your preferred region that you'd like to use, you can skip these steps.
    1. In AWS Services, search for EC2.
    2. In the top right corner, verify that you are in your preferred region.
    3. On the left hand navigation bar, choose Key Pairs.
    4. On the top right of the screen, select Create Key Pair.
    5. Provide a name and choose the pem format. The name can be any valid name.
  4. Return to the CDP Management Console and navigate to Environments > Register Environments.
  5. Provide an environment name and description. The name can be any valid name.
  6. Choose Amazon as the cloud provider.
  7. Under Amazon Web Services Credential, chose the credential that you created earlier.
  8. Click Next.
  9. Under Data Lake Settings, give your new data lake a name. The name can be any valid name. Choose the latest data lake version.
  10. Under Data Access:
    • Choose prefix-data-access-instance-profile>
    • For Storage Location Base, enter the S3Bucketname. If you specify a sub-directory (as in the screenshot), then CDP will create it.
    • For Data Access Role, choose prefix-datalake-admin-role. For example:

  11. For Data Lake Scale, choose Light Duty.
  12. Click Next.
  13. Under Select Region, choose your desired region. This should be the same region you created an SSH key in previously.
  14. Under Select Network, choose Create New Network.
  15. Under Security Access Settings, choose Create New Security Groups.

  16. Under SSH Settings, choose the SSH key you created earlier.
  17. Optionally, under Add Tags, provide any tags that you'd like the resources to be tagged with in your AWS account.
  18. Under Enable S3 Guard, enter prefix-dynamodb-table.
  19. Click Next.
  20. Under Logs - Storage and Audit:
    1. Choose the Instance Profile titled prefix-log-access-instance-profile, where "prefix" is the prefix you defined in the Parameters section of the stack details in AWS.
    2. For Logs Location Base, choose S3BucketName/my-dl, where S3BucketName is the bucket name you defined in the Parameters section of the stack details in AWS.
    3. For Ranger Audit Role, choose prefix-ranger-audit-role, where "prefix" is the prefix you defined in the Parameters section of the stack details in AWS.
      For example, using the parameters we defined earlier:

  21. Click Register Environment.