ReadyFlow: S3 to Databricks
You can use the S3 to Databricks ReadyFlow to retrieve CSV files from a source S3 location and write them as Parquet files to a destination S3 location and Databricks table.
This ReadyFlow consumes CSV data from a source S3 location, parses the data using schema(s) provided by the Cloudera Schema Registry, converts it to Parquet format and writes the data to a destination S3 bucket and Databricks table. The flow supports either a non-partitioned table or a partitioned table (single column only).
S3 to Databricks ReadyFlow details | |
---|---|
Source | Cloudera managed Amazon S3 |
Source Format | CSV |
Destination | Cloudera managed Amazon S3 and Databricks |
Destination Format | Parquet |