Deploying Kafka filter to Kafka ReadyFlow

Learn how to use the Deployment wizard to deploy the Kafka filter to Kafka ReadyFlow using the information you collected with the help of the prerequisites checklist.

The CDF Catalog is where you manage the flow definition lifecycle, from initial import, to versioning, and to deploying a flow definition.

  1. In DataFlow, from the left navigation pane, click Catalog.
    Flow definitions available for you to deploy are displayed, one definition per row.
  2. Launch the Deployment wizard.
    1. Click the row to display the flow definition details and versions.
    2. Click a row representing a flow definition version to display flow definition version details and the Deploy button.
    3. Click Deploy to launch the Deployment wizard.
  3. Select the environment to which you want to deploy this version of your flow definition, and click Continue.
  4. In the Overview, give your flow deployment a unique name.

    You can use this name to distinguish between different versions of a flow definition, flow definitions deployed to different environments, and similar.

  5. In NiFi Configuration:
    1. Select a NiFi Runtime Version for your flow deployment. Cloudera recommends that you always use the latest available version, if possible.
    2. Autostart Behavior is on by default, allowing your flow to start automatically after successful deployment. You can clear selection if you do not want the automatic start.
  6. In Parameters, specify parameter values like connection strings, usernames and similar, and upload files like truststores, and similar.

    For parameters specific to this ReadyFlow, see the Example with the configuration parameters table below.

  7. Specify your Sizing & Scaling configurations.
    NiFi node sizing
    You can adjust the size of your cluster from Extra Small to Large
    Number of NiFi nodes
    • You can set whether you want to automatically scale your cluster depending on resource demands. When you enable autoscaling, the minimum NiFi nodes are used for initial size and the workload scales up or down depending on resource demands.
    • You can set the number of nodes from 1 to 32.
  8. In Key Performance Indicators, you can set up your metrics system with specific KPIs to track the performance of a deployed flow. You can also define when and how to receive alerts about the KPI metrics tracking.

    See Working with KPIs for more information about the KPIs available and how you can monitor them.

  9. Review the summary of the information you provided in the Deployment wizard and make any necessary edits by clicking Previous. When you are finished, complete your flow deployment by clicking Deploy.

Once you click Deploy, you are being redirected to the Alerts tab in the detail view for the deployment where you can track its progress.

The following parameters are required for the Kafka filter to Kafka data flow. You have collected this information in the Meeting the pre-requisites step.

Table 1. Kafka filter to Kafka ReadyFlow configuration parameters
Parameter Name Description Example
CDP Workload User Specify the CDP machine user or workload user name that you want to use to authenticate to Kafka. Ensure this user has the appropriate access rights in Ranger for the source and target Kafka topics.
CDP Workload User Password Specify the CDP machine user or workload user name that you want to use to authenticate to Kafka.
Data Input Format Specify the format of your input data. If your data input is CSV, define a CSV delimiter for the data in the CSV Delimiter text box. If you use AVRO or JSON format, the delimiter is ignored.
  • CSV
  • JSON
  • AVRO
Data Output Format Specify the format of your output data. If your data input is CSV, define a CSV delimiter for the data in the CSV Delimiter text box. If you use AVRO or JSON format, the delimiter is ignored.
  • CSV
  • JSON
  • AVRO
CSV Delimiter If your source data is CSV, specify the delimiter here.
Filter Rule

If you want to filter your data for the destination topic, enter a filter rule expressed in SQL.

Records matching the filter are written to the destination topic in Kafka. If you do not provide a specific filter rule, the default rule forwards all records.

Default rule:SELECT * FROM FLOWFILE
Kafka Broker Endpoint Specify the Kafka bootstrap servers string as a comma separated list in the format <host>:<port>.
Kafka Consumer Group ID

Add the name of the consumer group used for the source topic you are consuming from.

Make sure to use the Consumer Group ID that the selected CDP Workload User is allowed to use.

Kafka Source Topic Specify a topic name that you want to read from.
Kafka Destination Topic Specify a topic name that you want to write to.
Kafka Producer ID Use it to identify your data flow in SMM.
Schema Name

Specify the schema that you want to use in your data flow.

DataFlow looks up this schema in the Schema Registry you define with the Schema Registry Hostname. See the Appendix for an example schema.

Schema Registry Hostname

Specify the hostname of the Schema Registry running on the master node in the Streams Messaging cluster that you want to connect to.

This must be the direct hostname of the Schema Registry itself, not the Knox Endpoint.