Onboarding CDP users and groups for GCP cloud storage

The minimal setup defined earlier spins up a CDP environment and Data Lake with no end user access to cloud storage. Adding users and groups to a CDP environment involves ensuring they are properly mapped to service accounts to access cloud storage.

In general, to have new users or groups onboarded, you need to do the following:

  1. Create a new service account and assign appropriate IAMroles or granular permissions on the scope of the Storage Location Base or its specific sub-directory. You might have already performed this step earlier during setting up the Minimum setup for cloud storage.
  2. In order to use these storage accounts in CDP, create a user/group to service account mapping in CDP.

This needs to be done for each user type. For example, you can create two service accounts in GCP, one for Data Scientists and another for Data Engineers, and then you map each of them to a group of users in CDP.

The onboarding of users can either happen as part of environment registration, or you can do it once an environment is running. The steps below show you how to onboard users once an environment is running.

Creating CDP user/group to service account mappings

After creating the additional service accounts, map each of them to a specific user or group.

Before you begin

The steps below show how to add the mappings to an existing environment. Alternatively, you can add them during environment registration, as mentioned in the Minimum setup for cloud storage.

Required role: DataSteward, EnvironmentAdmin, or Owner

Steps

  1. The option to add/modify the service account to user/group mappings is available from the Management Console under Environments > click on an environment > Actions > Manage Access > IDBroker Mappings.
  2. Under Current Mappings, click Edit.
  3. Click + to display a new field for adding a mapping.
  4. Provide the following:
    1. The User or Group dropdown is pre-populated with CDP users and groups. Select the user or group that you would like to map.
    2. Under Role, specify the resource ID of a service account (copied from Google Cloud IAM). For example “datascientists@gcp-cdpdev.iam.gserviceaccount.com”.
  5. Repeat the previous two steps if you would like to add additional mappings.
  6. Click Save and Sync.

For example, in the example setup we created the following roles:

  • DATAENG_ROLE - We created this role while onboarding users and we assume that there is a DataEngineers group that was created in CDP.
  • DATASCI_ROLE - We created this role while onboarding users and we assume that there is a DataScientists group that was created in CDP.

If you would like to create the mappings via CDP CLI, you can:

  1. Use the cdp environments get-id-broker-mappings command to obtain your current mappings.
  2. Use the cdp environments set-id-broker-mappings command to set additional mappings. The only way to use this command is to:
    1. Pass all the current mappings
    2. Add the new mappings.
  3. Next, sync IDBroker mappings to the environment:
    cdp environments sync-id-broker-mappings --environment-name demo3
  4. Finally, check the sync status:
    cdp environments get-id-broker-mappings-sync-status --environment-name demo3