ReadyFlow: S3 to Cloudera Data Warehouse
You can use the S3 to Cloudera Data Warehouse Readyflow to consume CSV files from a source S3 location and write them as Parquet files to a destination S3 location and Cloudera Data Warehouse Impala table.
This ReadyFlow consumes CSV files from a source S3 location, parses the schema by looking up the schema name in the Cloudera Schema Registry, converts the files into Parquet and writes them to a destination S3 location and Cloudera Data Warehouse Impala table. You can specify the source S3 location, the target S3 location and the destination Impala table name. The ReadyFlow polls the source bucket for new files (it performs a listing periodically). Define KPIs on the failure_WriteS3 and failure_CreateCDWImpalaTable connections to monitor failed write operations.
S3 to Cloudera Data Warehouse ReadyFlow details | |
---|---|
Source | Cloudera Public Cloud Managed Amazon S3 |
Source Format | CSV |
Destination | Cloudera Public Cloud Managed Amazon S3 and Cloudera Data Warehouse Impala |
Destination Format | Parquet |