Step 3: Register a GCP environment in CDP

The third (and last) step is to register your GCP environment in CDP. You will:

  • Use the credential created in Step1.
  • Point CDP to the resources created in Step 2.

You have two options for performing the environment registration step:

  • Option 1: CDP web interface
  • Option 2: CDP CLI

Prerequisites

You need an RSA key pair. You will be asked to provide a public key and you will use the matching private key for admin access to CDP instances.

Register a GCP environment using CDP web interface

  1. Switch back to the browser window with the CDP console.

  2. Navigate to the Management Console > Environments.

  3. Click the Register Environment button.

  4. In the General Information section provide the following:
    • Environment name - Provide a name for the environment.
    • Select Cloud Provider - Select Google Cloud.
  5. Under Select Credential, select the credential that you created in Step 1.
  6. Click Next.
  7. In the Data Lake Settings section, provide the following:
    • Data Lake Name - Enter a name for the Data Lake that CDP creates for your environment.
    • Data Lake version - Select 7.2.8.
  8. In the Data Access and Audit section, provide the following:
    • Assumer Service Account - Select <prefix>-idb-sa. The prefix is what you provided in Step 2.
    • Storage Location Base - Enter <prefix>-cdp-data
    • Data Access Role - Select <prefix>-dladm-sa
    • Ranger Audit Service Account - Enter the following service account name <prefix>-rgraud-sa@gcp-dev.iam.gserviceaccount.com
    • Under IDBroker Mappings:
      1. Click Add
      2. Under User or Group, select your user
      3. Under Service Account, enter the following service account name <prefix>-dladm-sa@gcp-dev.iam.gserviceaccount.com
  9. Click Next.
  10. Under Region, Location, select the same region that you provided in Step 2.
  11. In the Network section, provide the following:
    • Select Network - Select the network called <prefix>-cdp-network
    • Select Subnets - Select <prefix>-cdp-network-subnet-1
  12. Under Security Access Settings, under Select Security Access Type, select Do not create firewall rule.
  13. Under SSH Settings, paste your RSA public key.
  14. Under Add Tags add tags if necessary.
  15. Click Next.
  16. Under Logsprovide the following:
    • Logger Service Account - Select <prefix>-log-sa
    • Logs Location Base - Enter <prefix>-cdp-logs
    • Backups Location Base - Enter <prefix>-cdp-backup
  17. Click Register Environment.
  18. Once your environment is created, its status will change to Available and the Data Lake status will change to Running.

Once your environment is running, you can start creating Data Hub clusters.

Register a GCP environment using CDP CLI

  1. Install and configure CDP CLI. If you haven’t, refer to CLI client setup.
  2. Open the terminal app on your computer.

  3. Create your environment using the following command. Replace the following with actual values:
    • <NAME_OF_YOUR_CDP_CREDENTIAL> - Replace this with the actual name that you provided on the CDP web UI in step 1.
    • <REGION> - Replace this with the ID of the region selected in step 2.
    • <RSA_PUBLIC_KEY> - Replace this with your RSA public key. You will use the matching private key for admin access to CDP instances.
    • <PREFIX> - Replace this with the prefix specified in step 2.
    • <PROJECT_ID> - Replace this with the ID of the GCP project specified in step 2.
    cdp environments create-gcp-environment  --environment-name <PREFIX>-cdp-env \
        --credential-name <NAME_OF_YOUR_CDP_CREDENTIAL> \
        --region "<REGION>" \
        --public-key "<RSA_PUBLIC_KEY>" \
        --log-storage storageLocationBase="gs://<PREFIX>-cdp-logs/",serviceAccountEmail="<PREFIX>-log-sa@<PROJECT_ID>.iam.gserviceaccount.com" \
        --existing-network-params networkName="<PREFIX>-cdp-network",subnetNames="<PREFIX>-cdp-network-subnet-1",sharedProjectId="<PROJECT_ID>"  \
        --enable-tunnel \
        --use-public-ip    
  4. Find your user CRN using the following command:
    user_crn=$(cdp iam get-user | jq -r .user.crn)
  5. Set the IDBroker mappings between users and service accounts using the following command. Replace the following with actual values:
    • <PREFIX> - Same as used earlier
    • <PROJECT_ID> - Same as used earlier
    • <USER_CRN> - Replace with your user CRN.
    cdp environments set-id-broker-mappings \
        --environment-name "<PREFIX>-cdp-env" \
        --baseline-role "<PREFIX>-rgraud-sa@<PROJECT_ID>.iam.gserviceaccount.com" \
        --data-access-role "<PREFIX>-dladm-sa@<PROJECT_ID>.iam.gserviceaccount.com" \
        --mappings accessorCrn="<USER_CRN",role="<PREFIX>-dladm-sa@<PROJECT_ID>.iam.gserviceaccount.com"
  6. Create the Data Lake using the following command. Replace the following with actual values:
    • <PREFIX> - Same as used earlier
    • <PROJECT_ID> - Same as used earlier
    cdp datalake create-gcp-datalake --datalake-name <PREFIX>-cdp-dl \
        --environment-name <PREFIX>-cdp-env \
        --cloud-provider-configuration "serviceAccountEmail=<PREFIX>-idb-sa@<PROJECT_ID>.iam.gserviceaccount.com,storageLocation=gs://<PREFIX>-cdp-data,backupStorageLocationBase=gs://<PREFIX>-cdp-backup"
  7. Once your environment is created, its status will change to Available and the Data Lake status will change to Running.

Once your environment is running, you can start creating Data Hub clusters.