ReadyFlow: Kafka to Kudu

Learn about the Kafka to Kudu ReadyFlow.

This ReadyFlow consumes JSON, CSV or Avro data from a source Kafka topic, parses the schema by looking up the schema name in the CDP Schema Registry and ingests it into a Kudu table. You can pick the Kudu operation (INSERT, INSERT_IGNORE, UPSERT, UPDATE, DELETE, UPDATE_IGNORE, DELETE_IGNORE) that fits best for your use case. Failed Kudu write operations are retried automatically to handle transient issues. Define a KPI on the failure_WriteToKudu connection to monitor failed write operations.

Kafka to Kudu ReadyFlow details
Source Kafka topic
Source Format JSON, CSV, Avro
Destination Kudu
Destination Format Kudu Table
Table 1. Kafka to Kudu ReadyFlow configuration parameters
Parameter Name Description Example
CDP Workload User Specify the CDP machine user or workload user name that you want to use to authenticate to Kafka and Kudu. Ensure this user has the appropriate access rights to the Kafka topics and Kudu table.
CDP Workload User Password Specify the password of the CDP machine user or workload user you are using to authenticate against Kafka and Kudu.
CSV Delimiter If your source data is CSV, specify the delimiter here.
Data Input Format Specify the format of your input data. You can use "CSV", "JSON" or "AVRO" with this ReadyFlow.
Kafka Broker Endpoint Specify the Kafka bootstrap servers string as a comma separated list.
Kafka Consumer Group ID Specify the ID for the consumer group used for the source topic you are consuming from.
Kafka Source Topic Specify a topic name that you want to read from.
Kudu Master Hosts Specify the Kudu Master hostnames in a comma separated list.
Kudu Operation Type

Specify the operation that you want to use when writing data to Kudu.

Valid values are:

  • INSERT

  • INSERT_IGNORE

  • UPSERT

  • UPDATE

  • DELETE

Kudu Table Name Specify the Kudu table name you want to write to.
Schema Name Specify the schema name to be looked up in the Schema Registry.
Schema Registry Hostname Specify the hostname of the Schema Registry you want to connect to. This must be the direct hostname of the Schema Registry itself, not the Knox Endpoint.