Before creating your cluster

Before you start creating your Streaming Analytics Data Hub cluster, you need to ensure that you have set up the environment properly and have all the necessary accesses to use CDP Public Cloud. As an EnvironmentAdmin, you need to provide access to users to the CDP Public Cloud environment and to the Streaming Analytics cluster by assigning access roles to the user and creating IDBroker mappings. A workload password also needs to be set for further authentication.

  • You have CDP login credentials.
  • You have an available CDP environment.
  • You have a running Data Lake.
  • You have a CDP username and the predefined resource role of this user is EnvironmentAdmin.
  • Your CDP user is synchronized to the CDP Public Cloud environment.

Assigning resource roles

As an administrator, you need to give permissions to users or groups to be able to access and perform tasks in your Data Hub environment.

  1. Navigate to Management Console > Environments and select your environment.
  2. Click Actions > Manage Access.
  3. Search for a user or group that needs access to the environment.
  4. Select EnvironmentUser role from the list of Resource Roles.
  5. Click Update Roles.
    The Resource Role for the selected user or group will be updated.
  6. Navigate to Management Console > Environments, and select the environment where you want to create a cluster.
  7. Click Actions > Synchronize Users.
    You are redirected to the Synchronize Users to FreeIPA page.
  8. Click Synchronize Users.

Creating IDBroker mapping

As an administrator, you must create IDBroker mapping for a user or group to access cloud storage. As a part of Knox, the IDBroker allows a user to exchange cluster authentication for temporary cloud credentials.

You must create IDBroker mapping for a user or group to have access to the S3 cloud storage. As a part of Knox, the IDBroker allows a user to exchange cluster authentication for temporary cloud credentials. The following roles are created when registering the CDP environment:
  • idbroker-role: granting permissions to IDBroker instances associated with the CDP environment
  • datalake-admin-role: granting access to CDP cloud resources
  • logs-role: granting access to the logs storage location
For using Streaming Analytics in CDP Public Cloud, you must make sure that the users who run Flink jobs are associated with the ARN of the datalake-admin-role as it grants access to the cloud resources required to run the Flink service.
  1. Navigate to Management Console > Environments and select your environment.
  2. Click Actions > Manage Access.
  3. Click on the IDBroker Mappings tab.
  4. Click Edit to add a new user or group and assign roles to have writing access for the cloud storage.
  5. Search for the user or group you need to map.
  6. Go to the IAM Summary page where you can find information about your cloud storage account.
  7. Copy the Role ARN.
  8. Go back to the IDBroker Mapping interface on the Cloudera Management Console page.
  9. Paste the Role ARN to your selected user or group.
  10. Click Save and Sync.

Setting workload password

As a user, you need to set a workload password for your EnvironmentUser account to be able to access the Streaming Analytics nodes through SSH connection.

  1. Navigate to Management Console > Environments and select your environment.
  2. Click Actions > Manage Access.
  3. Click Workload Password.
  4. Give a chosen workload password for your user.
  5. Confirm the given password by typing it again.
  6. Click Set Workload Password.