ReadyFlow: S3 to Databricks

You can use the S3 to Databricks ReadyFlow to retrieve CSV files from a source S3 location and write them as Parquet files to a destination S3 location and Databricks table.

This ReadyFlow consumes CSV data from a source S3 location, parses the data using schema(s) provided by the Cloudera Schema Registry, converts it to Parquet format and writes the data to a destination S3 bucket and Databricks table. The flow supports either a non-partitioned table or a partitioned table (single column only).

S3 to Databricks ReadyFlow details
Source Cloudera managed Amazon S3
Source Format CSV
Destination Cloudera managed Amazon S3 and Databricks
Destination Format Parquet