ReadyFlow: ADLS to Databricks
You can use the ADLS to Databricks ReadyFlow to retrieve CSV files from a source ADLS location and write them as Parquet files to a destination ADLS location and Databricks table.
This ReadyFlow consumes CSV data from a source ADLS location, parses the data using schema(s) provided by the Cloudera Schema Registry, converts it to Parquet format and writes the data to a destination ADLS location and Databricks table. The flow supports either a non-partitioned table or a partitioned table (single column only).
ADLS to Databricks ReadyFlow details | |
---|---|
Source | Cloudera managed ADLS |
Source Format | CSV |
Destination | Cloudera managed ADLS and Databricks |
Destination Format | Parquet |
Moving data to object stores
Cloud environments offer numerous deployment options and services. There are many ways to store data in the cloud, but the easiest option is to use object stores. Object stores are extremely robust and cost-effective storage solutions with multiple levels of durability and availability. You can include them in your data pipeline, both as an intermediate step and as an end state. Object stores are accessible to many tools and connecting systems, and you have a variety of options to control access.