Prerequisites

Learn how to collect the information you need to deploy the MongoDB CDC to Kudu [Technical Preview] ReadyFlow, and meet other prerequisites.

For your data ingest source

  • You have obtained the MySQL database server hostname and port.

  • You have obtained the MySQL schema name and table name. Take note of the table structure, specifically field case sensitivity.

  • You have obtained a username and password to access the MySQL table.

  • You have performed the MySQL setup tasks required to run Debezium.

For DataFlow

  • You have enabled DataFlow for an environment.

    For information on how to enable DataFlow for an environment, see Enabling DataFlow for an Environment.

  • You have created a Machine User to use as the CDP Workload User.

  • You have given the CDP Workload User the EnvironmentUser role.
    1. From the Management Console, go to the environment for which DataFlow is enabled.
    2. From the Actions drop down, click Manage Access.
    3. Identify the user you want to use as a Workload User.
    4. Give that user EnvironmentUser role.
  • You have synchronized your user to the CDP Public Cloud environment that you enabled for DataFlow.

    For information on how to synchronize your user to FreeIPA, see Performing User Sync.

  • You have granted your CDP user the DFCatalogAdmin and DFFlowAdmin roles to enable your user to add the ReadyFlow to the Catalog and deploy the flow definition.
    1. Give a user permission to add the ReadyFlow to the Catalog.
      1. From the Management Console, click User Management.
      2. Enter the name of the user or group you wish to authorize in the Search field.
      3. Select the user or group from the list that displays.
      4. Click Roles > Update Roles.
      5. From Update Roles, select DFCatalogAdmin and click Update.
    2. Give your user or group permission to deploy flow definitions.
      1. From the Management Console, click Environments to display the Environment List page.
      2. Select the environment to which you want your user or group to deploy flow definitions.
      3. Click Actions > Manage Access to display the Environment Access page.
      4. Enter the name of your user or group you wish to authorize in the Search field.
      5. Select your user or group and click Update Roles.
      6. Select DFFlowAdmin from the list of roles.
      7. Click Update Roles.
    3. Give your user or group access to the Project where the ReadyFlow will be deployed.
      1. Go to DataFlow > Projects.
      2. Select the project where you want to manage access rights and click More > Manage Access.
    4. Start typing the name of the user or group you want to add and select them from the list.
    5. Select the Resource Roles you want to grant.
    6. Click Update Roles.
    7. Click Synchronize Users.

For your data ingest target

  • You have a Real-Time Data Mart cluster running Kudu, Impala, and Hue in the same environment for which DataFlow has been enabled.

  • You have the Kudu Master hostnames.

    1. From Management Console, click Data Hub Clusters.
    2. Select the Real-Time Data Mart cluster to which you want to ingest data into.
    3. Click the Hardware tab.
    4. Copy the FQDN for each Kudu Master.
  • You have the Kudu database name.
    1. Navigate to your Real Time Data Mart cluster and click Hue from the Services pane.
    2. Click the Tables icon on the left pane.
    3. Select the default database.
  • You have created the Kudu table that you want to ingest data into. Ensure that the field case sensitivity matches that of the source table.

    1. Navigate to your Real Time Data Mart cluster and click Hue from the Services pane.
    2. Click the Tables icon on the left pane.
    3. Select the default database, and click + New to create a new table.
    4. In the Type field, select Manually and click Next.
    5. Provide the table Name, Format, Primary keys, and any partitions.
    6. Click Submit. The newly created table displays in the default database Tables pane.
    7. Check the Kudu UI Tables tab for the name of the table you created. You will need this table name when you use the DataFlow Deployment wizard to deploy the ReadyFlow.
  • You have assigned permissions via IDBroker or in Ranger to enable the CDP Workload User to access the Kudu table that you want to ingest data into.

    1. From the base cluster on CDP Public Cloud, select Ranger.
    2. Select your Real Time Data Mart cluster from the Kudu folder.
    3. Click Add New Policy policy.
    4. On the Create Policy page, enter the Kudu table name in the topic field.
    5. Add the CDP Workload User in the Select User field.
    6. Add the Insert and Select permissions in the Permissions field.
    7. Click Save.