ReadyFlow Overview: Azure Event Hub to ADLS

You can use the Azure Event Hub to ADLS ReadyFlow to move JSON, CSV or Avro files from an Azure Event Hub namespace, optionally parsing the schema using Cloudera Schema Registry or direct schema input. The flow then filters records based on a user-provided SQL query and writes them to a target Azure Data Lake Storage (ADLS) location in the specified output data format.

This ReadyFlow consumes JSON, CSV or Avro data from a source Azure Event Hub Namespace and merges the events into JSON, CSV or Avro files before writing the data to ADLS. The flow writes out a file every time its size has either reached 100MB or five minutes have passed. Files can reach a maximum size of 1GB. You can specify the Event Hub Namespace you want to read from as well as the target ADLS data container and path. Failed ADLS write operations are retried automatically to handle transient issues. Define a KPI on the failure_WriteToADLS connection to monitor failed write operations.

Azure Event Hub to ADLS ReadyFlow details
Source Azure Event Hub Namespace
Source Format JSON, CSV, Avro
Destination ADLS
Destination Format JSON, CSV, Avro

Moving data to object stores

Cloud environments offer numerous deployment options and services. There are many ways to store data in the cloud, but the easiest option is to use object stores. Object stores are extremely robust and cost-effective storage solutions with multiple levels of durability and availability. You can include them in your data pipeline, both as an intermediate step and as an end state. Object stores are accessible to many tools and connecting systems, and you have a variety of options to control access.