Available ReadyFlows

Using a ReadyFlow to build your data flow allows you to get started with Cloudera Data Flow quickly and easily. A ReadyFlow is a flow definition template optimized to work with a specific Cloudera source and destination. So instead of spending your time on building the data flow in NiFi, you can focus on deploying your flow and defining the right KPIs for easy monitoring.

The ReadyFlow Gallery is where you can find out-of-box flow definitions. To use a ReadyFlow, add it to the Catalog and then use it to create a Flow Deployment. You can also use ReadyFlows as templates for designing new flows. To do so, Select a ReadyFLow from the ReadyFLow Gallery and use it to create a new draft.

ADLS to ADLS Avro

This ReadyFlow consumes JSON, CSV or Avro data from a source ADLS container and transforms the data into Avro files before writing it to another ADLS container.

ADLS to Chroma DB [Technical Preview]

You can use the ADLS to Chroma DB ReadyFlow to consume PDF documents from ADLS, vectorize them using an OpenAI model and write the results to Chroma DB.

ADLS to Databricks

You can use the ADLS to Databricks ReadyFlow to retrieve CSV files from a source ADLS location and write them as Parquet files to a destination ADLS location and Databricks table.

ADLS to Milvus [Technical Preview]

You can use the ADLS to Milvus ReadyFlow to consume PDF documents from ADLS, vectorize them using a HuggingFace model and write the results to Milvus.

ADLS to OpenSearch [Technical Preview]

You can use the ADLS to OpenSearch ReadyFlow to consume PDF documents from ADLS, vectorize them using an OpenAI model and write the results to OpenSearch.

ADLS to Pinecone

You can use the ADLS to Pinecone ReadyFlow to consume PDF documents from ADLS, vectorize them using an OpenAI model and write the results to Pinecone.

ADLS to Qdrant

You can use the ADLS to Qdrant ReadyFlow to consume PDF documents from ADLS, vectorize them using an OpenAI model and write the results to Qdrant.

Airtable to S3/ADLS

You can use the Airtable to S3/ADLS ReadyFlow to consume objects from an Airtable table, filter them, and write them as JSON, CSV or Avro files to a destination in Amazon S3 or Azure Data Lake Service (ADLS).

Azure Event Hub to ADLS

You can use the Azure Event Hub to ADLS ReadyFlow to ingest JSON, CSV or Avro files from an Azure Event Hub namespace, optionally parsing the schema using Cloudera Schema Registry or direct schema input. The flow then filters records based on a user-provided SQL query and writes them to a target Azure Data Lake Storage (ADLS) location in the specified output data format.

Box to S3/ADLS

You can use the Box to S3/ADLS ReadyFlow to move data from a source Box location to a destination Amazon S3 bucket or Azure Data Lake Storage (ADLS) location.

Confluent Cloud to S3/ADLS

This ReadyFlow consumes JSON, CSV or Avro data from a source Kafka topic in Confluent Cloud and parses the schema by looking up the schema name in the Confluent Schema Registry. The filtered events are then written to the destination S3 or ADLS location.

Confluent Cloud to Snowflake

This ReadyFlow consumes JSON, CSV or Avro data from a source Kafka topic in Confluent Cloud and filters records based on a user-provided SQL query before writing it to a Snowflake table.

Db2 CDC to Iceberg [Technical Preview]

You can use the Db2 CDC to Iceberg [Technical Preview] ReadyFlow to retrieve CDC events from a Db2 source table and stream them into Iceberg.

Db2 CDC to Kudu

You can use the Db2 CDC to Kudu ReadyFlow to retrieve CDC events from a Db2 source table and stream them to a Kudu destination table.

Dropbox to S3/ADLS

You can use the Dropbox to S3/ADLS ReadyFlow to ingest data from Dropbox and write it to a destination in Amazon S3 or Azure Data Lake Service (ADLS).

Google Drive to S3/ADLS

You can use the Google Drive to S3/ADLS ReadyFlow to ingest data from a Google Drive location to a destination in Amazon S3 or Azure Data Lake Service (ADLS).

Hello World

This ReadyFlow consumes change data from the Wikipedia API and converts JSON events to Avro, filtering and merging them before a writing a file to local disk.

HubSpot to S3/ADLS

This ReadyFlow retrieves objects from a Private HubSpot App, converting them to the specified output data format and writing them to the target S3 or ADLS destination.

HuggingFace to S3/ADLS

You can use the HuggingFace to S3/ADLS ReadyFlow to retrieve a HuggingFace dataset and write the Parquet data to a target S3 or ADLS destination.

JDBC to JDBC

This ReadyFlow moves data between database tables, filtering records by means of an SQL query.

JDBC to S3/ADLS

This ReadyFlow consumes data from a source database table and filters events based on a user-provided SQL query before writing it to a destination Amazon S3 or Azure Data Lake Storage (ADLS) location in the specified output data format.

Kafka Filter to Kafka

This ReadyFlow consumes JSON, CSV, or Avro data from a source Kafka topic and parses the schema by looking up the schema name in the Cloudera Schema Registry. You can filter events by specifying a SQL query in the Filter Rule parameter.

Kafka to ADLS Avro

This ReadyFlow consumes JSON, CSV or Avro data from a source Kafka topic and merges the events into Avro files before writing the data to ADLS. The flow writes out a file every time its size has either reached 100 MB or five minutes have passed.

Kafka to Cloudera Operational Database

This ReadyFlow consumes JSON, CSV or Avro data from a source Kafka topic, parses the schema by looking up the schema name in the Cloudera Schema Registry and ingests it into an HBase table in Cloudera Operational Database.

Kafka to Iceberg

his ReadyFlow consumes JSON, CSV, or Avro data from a source Kafka topic, parses the schema by looking up the schema name in the Cloudera Schema Registry, and ingests data into an Iceberg table in Hive.

Kafka to Kafka

This ReadyFlow consumes JSON, CSV, or Avro data from a source Kafka topic and parses the schema by looking up the schema name in the Cloudera Schema Registry.

Kafka to Kudu

This ReadyFlow consumes JSON, CSV or Avro data from a source Kafka topic, parses the schema by looking up the schema name in the Cloudera Schema Registry and ingests it into a Kudu table.

Kafka to S3 Avro

This ReadyFlow consumes JSON, CSV or Avro data from a source Kafka topic and merges the events into Avro files before writing the data to S3. The flow writes out a file every time its size has either reached 100 MB or five minutes have passed.

Kafka to Snowflake

This ReadyFlow consumes JSON, CSV or Avro data from a source Kafka topic and parses the schema by looking up the schema name in the Cloudera Schema Registry. You can filter events by specifying a SQL query in the Filter Rule parameter. The filtered events are then merged into CSV files, compressed into gzip format and written to the destination Snowflake DB table.

ListenHTTP filter to Kafka

This ReadyFlow listens to a JSON, CSV or Avro data stream on a specified port and parses the schema by looking up the schema name in the Cloudera Schema Registry. You can filter events by specifying a SQL query. The filtered events are then converted to the specified output data format and written to the destination Cloudera Kafka topic.

ListenSyslog filter to S3/ADLS

This ReadyFlow listens to a Syslog data stream on a specified port. You can filter events by specifying a SQL query. The filtered events are then converted to the specified output data format and written to the target S3 or ADLS destination.

ListenTCP filter to S3/ADLS

This ReadyFlow listens to a JSON, CSV or Avro data stream and parses the data based on a specified Avro-formatted schema. You can filter events by specifying a SQL query. The filtered events are then converted to the specified output data format and written to the target S3 or ADLS destination.

MySQL CDC to Iceberg [Technical Preview]

You can use the MySQL CDC to Iceberg [Technical Preview] ReadyFlow to retrieve CDC events from a MySQL source table and stream them to an Iceberg destination table.

MySQL CDC to Kudu

You can use the MySQL CDC to Kudu ReadyFlow to retrieve CDC events from a MySQL source table and stream them to a Kudu destination table.

MQTT Filter to Kafka

This ReadyFlow consumes JSON, CSV or Avro data from a source MQTT topic. You can filter events by specifying a SQL query. The filtered events are then converted to the specified output data format and written to the destination Kafka topic.

Non-Cloudera ADLS to Cloudera ADLS

This ReadyFlow moves data between non Cloudera-managed source and Cloudera-managed destination ADLS locations.

Non-Cloudera ADLS to S3/ADLS

This ReadyFLow consumes files from a source non-Cloudera ADLS location and writes them to a destination S3 or ADLS location.

Non-Cloudera S3 to Cloudera S3

This ReadyFlow moves data between non Cloudera-managed source and Cloudera-managed destination S3 locations.

Non-Cloudera S3 to S3/ADLS

This ReadyFlow consumes files from an external source non-Cloudera S3 location and writes them to a Cloudera managed destination S3 or ADLS location.

Oracle CDC to Iceberg [Technical Preview]

You can use the Oracle CDC to Iceberg ReadyFlow to retrieve events from an Oracle table and stream them into Iceberg.

Oracle CDC to Kudu

You can use the Oracle CDC to Kudu ReadyFlow to retrieve CDC events from an Oracle source table and stream them to a Kudu destination table.

PostgreSQL CDC to Iceberg [Technical Preview]

This ReadyFlow uses Debezium to retrieve events from a PostgreSQL table and stream them into Iceberg.

PostgreSQL CDC to Kudu

You can use the PostgreSQL CDC to Kudu ReadyFlow to retrieve CDC events from a PostgreSQL source table and stream them to a Kudu destination table.

RAG Query Milvus [Technical Preview]

You can use this ReadyFlow to query Milvus VectorDB with an embedded prompt.

RAG Query Pinecone [Technical Preview]

You can use this ReadyFlow to query Pinecone VectorDB with an embedded prompt.

S3 to Chroma DB [Technical Preview]

You can use the S3 to Chroma DB ReadyFlow to consume PDF documents from S3, vectorize them using an OpenAI model and write the results to Chroma DB.

S3 to Cloudera Data Warehouse

This ReadyFlow consumes CSV files from a source S3 location, parses the schema by looking up the schema name in the Cloudera Schema Registry, converts the files into Parquet and writes them to a destination S3 location and Cloudera Data Warehouse Impala table. You can specify the source S3 location, the target S3 location and the destination Impala table name.

S3 to Databricks

You can use the S3 to Databricks ReadyFlow to retrieve CSV files from a source S3 location and write them as Parquet files to a destination S3 location and Databricks table.

S3 to IBM watsonx

You can use the S3 to IBM watsonx ReadyFlow to retrieve text files from a source S3 location and use IBM watsonx to summarize the text, then write the LLM response to a destination S3 location.

S3 to Milvus [Technical Preview]

You can use the S3 to Milvus ReadyFlow to consume PDF documents from S3, vectorize them using a HuggingFace model and write the results to Milvus.

S3 to OpenSearch [Technical Preview]

You can use the S3 to OpenSearch ReadyFlow to consume PDF documents from S3, vectorize them using an OpenAI model and write the results to OpenSearch.

S3 to Pinecone

This ReadyFlow consumes PDF documents from a source S3 location, vectorizes the data using an OpenAI embedding model, and stores the results in Pinecone vector DB.

S3 to Qdrant

This ReadyFlow consumes PDF documents from a source S3 location, vectorizes them using an OpenAI model and writes the results to Qdrant.

S3 to S3 Avro

This ReadyFlow consumes JSON, CSV or Avro data from a source S3 bucket and transforms the data into Avro files before writing it to another S3 bucket.

S3 to S3 Avro with S3 Notifications

This ReadyFlow consumes JSON, CSV or Avro data from a source S3 bucket and transforms the data into Avro files before writing it to another S3 bucket. The ReadyFLow is configured with notifications about new files that arrive in the sourcce AWS bucket.

Salesforce filter to S3/ADLS

You can use the Salesforce filter to S3/ADLS ReadyFlow to consume objects from a Salesforce database table, filter them, and write the data as JSON, CSV or Avro files to a destination in Amazon S3 or Azure Data Lake Service (ADLS).

Shopify to S3/ADLS

This ReadyFlow consumes objects from a Custom Shopify App, converts them to the specified output data format, and writes them to a CDP managed destination S3 or ADLS location.

Slack to Chroma DB [Technical Preview]

You can use the Slack to Chroma DB ReadyFlow to consume PDF documents from a Slack channel, vectorize them using an OpenAI model and write the results to Chroma DB.

Slack to Milvus

This ReadyFLow consumes messages from a Slack channel, vectorizes them using a HuggingFace model and writes the results to Milvus.

Slack to OpenSearch

This ReadyFLow consumes messages from a Slack channel, vectorizes them using an OpenAI model and writes the results to OpenSearch.

Slack to Pinecone

This ReadyFlow consumes messages from a Slack channel, vectorizes them using an OpenAI model and writes the results to Pinecone.

Slack to Qdrant

This ReadyFlow consumes messages from a Slack channel, vectorizes them using an OpenAI model and writes the results to Qdrant.

Slack to S3/ADLS

This ReadyFlow consumes events from a Slack App, converts them to the specified output data format, and writes them to a Cloudera-managed destination S3 or ADLS location. For the source, subscribe to the events to be notified of in Slack. For the destination, specify the S3 or ADLS storage location and path.

SQL Server CDC to Iceberg [Technical Preview]

This ReadyFlow uses Debezium to retrieve events from a SQL Server table and stream them into Iceberg.

SQL Server CDC to Kudu

This ReadyFlow ReadyFlow retrieves CDC events from a SQL Server source table and streams them to a Kudu destination table.