Onboarding CDP users and groups for cloud storage

The minimal setup defined earlier spins up a CDP environment and Data Lake with no end user access to cloud storage. Adding users and groups to a CDP environment involves ensuring they are properly mapped to IAM roles to access cloud storage.

In general, to onboard a new user or group to be onboarded you should have the following IAM roles and policies pre-created in AWS:

  • One IAM role for the user/group
  • One IAM policy for the user/group role to access the required S3 bucket(s) and path(s)

In the example below, we are adding a data engineering group and a data science group to the CDP environment. The final goal is to have the following that builds on the minimal setup:

Role Permissions policy Trust policy Description
DATAENG_ROLE aws-cdp-dataeng-policy-s3access

aws-cdp-bucket-access-policy

aws-cdp-dynamodb-policy

aws-cdp-idbroker-role-trust-policy This role uses the three permissions policies to provide data engineers with access to a specific S3 location (s3://my-bucket/my-dl/dataeng).

The trust policy allows the role to be assumed by IDBroker.

DATASCI_ROLE aws-cdp-datasci-policy-s3access

aws-cdp-bucket-access-policy

aws-cdp-dynamodb-policy

aws-cdp-idbroker-role-trust-policy This role uses the three permissions policies to provide data scientists with access to a specific S3 location (s3://my-bucket/my-dl/datasci).

The trust policy allows the role to be assumed by IDBroker.

Creating IAM resources

You can create IAM roles and policies from the IAM console on AWS or from AWS CLI. For IAM policy definitions, refer to IAM policy definitions.

Adding CDP user/group to IAM role mappings

After creating the two additional IAM roles, one for data engineers (data-eng-mi) and one data scientists (data-science-mi), map them to specific user/group in CDP.

Steps - CDP web interface

  1. The option to add/modify these mappings is available from the Management Console under Environments > click on an environment > Actions > Manage Access > IDBroker Mappings > Edit.
  2. Under Current Mappings, click Edit.
  3. Click + to display a new field for adding a mapping.
  4. Provide the following:
    1. The User or Group dropdown is pre-populated with CDP users and groups. Select the user or group that you would like to map.
    2. Under Role, specify the role ARN (copied from the IAM role page on AWS). You should select your DATAENG_ROLE here.
  5. Repeat the previous two steps to add additional mapping for the DATASCI_ROLE.
  6. For example, in the example setup we created the following roles:
    • DATAENG_ROLE - We created this role while onboarding users and we assume that there is a DataEngineers group that was created in CDP.
    • DATASCI_ROLE - We created this role while onboarding users and we assume that there is a DataScientists group that was created in CDP.

    Based on the roles and groups created in this example, the mapping that need to be created are:

  7. Click Save and Sync.

Steps - CDP CLI

If you would like to create the mappings via CDP CLI, you can:

  1. Use the cdp environments get-id-broker-mappings command to obtain your current mappings.
  2. Use the cdp environments set-id-broker-mappings command to set additional mappings. The only way to use this command is to:
    • Pass all the current mappings
    • Add the new mappings
  3. Next, sync IDBroker mappings. For example:
    cdp environments sync-id-broker-mappings --environment-name demo3
  4. Finally, check the sync status. For example:
    cdp environments get-id-broker-mappings-sync-status --environment-name demo3