ReadyFlow: Kafka to Kudu

This use case shows you how you can move your data from a Kafka topic into Apache Kudu in your Cloudera Public Cloud Real-time Data Mart cluster. You can learn how to create such a data flow easily using the Kafka to Kudu ReadyFlow.

This ReadyFlow consumes JSON, CSV or Avro data from a source Kafka topic, parses the schema by looking up the schema name in the Cloudera Schema Registry and ingests it into a Kudu table. You can pick the Kudu operation (INSERT, INSERT_IGNORE, UPSERT, UPDATE, DELETE, UPDATE_IGNORE, DELETE_IGNORE) that fits best for your use case. Failed Kudu write operations are retried automatically to handle transient issues. Define a KPI on the failure_WriteToKudu connection to monitor failed write operations.

Kafka to Kudu ReadyFlow details
Source Kafka topic
Source Format JSON, CSV, Avro
Destination Kudu
Destination Format Kudu Table

Time series use cases analyse data obtained during specified intervals, and enable you to improve performance based on available data. Examples include:

  • Optimizing yield or yield quality in a manufacturing plant

  • Dynamically optimizing network capacity during peak load of better telecommunications uptime and services

These use cases require that you store events at a high frequency, while providing ad-hoc query and record update abilities.