Using auto discovery of services

Using the auto discovery of services in SQL Stream Builder (SSB), you can easily connect to catalogs and data sources that are running on a different Data Hub cluster in the same CDP Public Cloud environment.

You can use the service discovery to automatically register Kafka, Schema Registry and Kudu with SSB within the same environment. This means when you have a Streams Messaging or Real-Time Data Mart cluster beside your Streaming Analytics cluster in your environment, you can use the auto discovery to automatically import the tables from both services. Using the auto discovery has the same result as manually adding the catalogs and providers: you can read from and write to Kafka, Schema Registry and Kudu. However, when you have numerous Kafka, Schema Registry and Kudu services in your environment, you do not have to add them one by one, instead you can use the service discovery to import them all at once.

For more information about manually adding data sources and catalogs, see the Registering Data providers in SSB section.

The auto discovery feature does not check the changes you make in your environment. This means that you can register the newly added services from your environment to SSB using the same process. However, when you delete the cluster from the environment, the already registered catalog and data sources are not removed from SSB, you need to manually delete them from the Streaming SQL Console.

Setting up a machine user for service discovery

As an administrator, you need to make sure to set up a machine user that is used to register the services from the clusters to SSB. The machine user information must be provided when configuring the service discovery in SSB.

For the auto discovery, you need to create a machine user, and assign the correct role so the machine user has access to the environment. The machine user must be also synchronized to the environment.

  1. Navigate to Management Console > User Management page.
  2. Click Actions > Create Machine User.
  3. Provide a name to the Machine User.
  4. Click Create.
    You are redirected to the User Profile page.
  5. Click Set Workload Password.
  6. Provide a Workload Password for the Machine User.
  7. Click Set Workload Password.
  8. Select Generate Access Key under Access Keys tab.
  9. Click Generate Access Key.
    The Generate Access Key window pops up to indicate that the access key generation was successful.
  10. Take note of the Access Key ID and Private Key.
  11. Click Download Credentials File.
  12. Click Close.
  13. Navigate to Management Console > Environments and select your environment.
  14. Click Actions > Manage Access.
  15. Search for the Machine User you have created.
  16. Select EnvironmentAdmin role from the list of Resource Roles.
  17. Click Update Roles.
    The Resource Role for the selected user or group will be updated.
  18. Navigate to Management Console > Environments, and select the environment where you want to create a cluster.
  19. Click Actions > Synchronize Users to FreeIPA.
  20. Click Synchronize Users.
Take note of the machine user information. After setting up the machine user for the service discovery feature, you need to access the Streaming SQL Console, and configure the environment settings to be able to import the data sources or catalogs from the cluster in the environment.

Using service discovery on Streaming SQL Console

Using the Streaming SQL Console, you can import the Kafka, Schema Registry and Kudu services that already exist in your CDP Public Cloud environment. To enable service discovery, you need to provide the environment and machine user information that was provided by your administrator.

  • Ensure that a machine user is set up for your environment.

    For more information, see the Setting up a machine user for service discovery documentation.

  • Ensure that you have every information that is needed for the configuration of service discovery using the following list:
    • Workload username and password of the machine user
    • Name of the CDP Public Cloud environment
    • Base URL of the CDP Public Cloud environment
    • Access key ID and Private Key of the machine user
  1. Navigate to the Streaming SQL Console.
    1. Navigate to Management Console > Environments, and select the environment where you have created your cluster.
    2. Select the Streaming Analytics cluster from the list of Data Hub clusters.
    3. Select Streaming SQL Console from the list of services.
      The Streaming SQL Console opens in a new window.
  2. Open a project from the Projects page of Streaming SQL Console.
    1. Select an already existing project from the list by clicking the Open button or Switch button.
    2. Create a new project by clicking the New Project button.
    3. Import a project by clicking the Import button.
    You are redirected to the Explorer view of the project.
  3. Click Data Hub Service Discovery from the Project Manager.
  4. Provide the needed information to the corresponding field.
    Field Description Example
    Username Workload username of the Machine User. csa-test
    Password Workload password of the Machine User. Password123!
    Environment Name The name of the CDP Public Cloud environment you are using. csa-test-env
    Base URL The URL snippet from the Management Console access URL. https://console.us-west-1.cdp.cloudera.com
    Access Key ID Access Key ID of the Machine User. 00a0a0aa-00aa-00aa-00a0-0000a0a00a0
    Access Key Private Key of the Machine User. Z66pxEXAMPLEKEY+Y3oIU070GEHS3kqi1EXAMPLEKEY=
  5. Click Start Discovery.
The existing Kafka, Kudu and Schema Registry services in your environment should appear under Kafka Sources and Catalogs in Streaming SQL Console.