ReadyFlow: ADLS to Databricks

You can use the ADLS to Databricks ReadyFlow to retrieve CSV files from a source ADLS location and write them as Parquet files to a destination ADLS location and Databricks table.

This ReadyFlow consumes CSV data from a source ADLS location, parses the data using schema(s) provided by the CDP Schema Registry, converts it to Parquet format and writes the data to a destination ADLS location and Databricks table. The flow supports either a non-partitioned table or a partitioned table (single column only).

ADLS to Databricks ReadyFlow details
Source CDP managed ADLS
Source Format CSV
Destination CDP managed ADLS and Databricks
Destination Format Parquet

Moving data to object stores

Cloud environments offer numerous deployment options and services. There are many ways to store data in the cloud, but the easiest option is to use object stores. Object stores are extremely robust and cost-effective storage solutions with multiple levels of durability and availability. You can include them in your data pipeline, both as an intermediate step and as an end state. Object stores are accessible to many tools and connecting systems, and you have a variety of options to control access.