Understanding the use case

You can use the Kafka filter to Kafka ReadyFlow to move your data between two Kafka topics, while applying a schema to the data in Cloudera DataFlow (CDF).

Using a ReadyFlow to build your data flow allows you to get started with CDF quickly and easily. A ReadyFlow is a flow definition template optimized to work with a specific CDP source and destination. So instead of spending your time on building the data flow in NiFi, you can focus on deploying your flow and defining the right KPIs for easy monitoring.

This use case walks you through the steps of deploying a Kafka filter to Kafka data flow. You can use this flow when you want to filter specific events from a Kafka topic and write the filtered stream to another Kafka topic. For example, you could use this flow to filter out erroneous records that contain "null" for an important field. In order to filter the data using a SQL query, you need to provide a schema for the events that you are processing in the topic.

Your data flow can consume JSON, CSV, or Avro data from the source Kafka topic and write to the destination Kafka topic in any of these formats. The data flow parses the schema by looking up the schema name in the CDP Schema Registry. You can filter the events by specifying a SQL query. The filtered events are converted to the specified output data format, and they are written to the destination Kafka topic.