ReadyFlow overview: HuggingFace to S3/ADLS
You can use the HuggingFace to S3/ADLS ReadyFlow to retrieve a HuggingFace dataset and write the Parquet data to a target S3 or ADLS destination.
This ReadyFlow retrieves a dataset from the HuggingFace API and writes the Parquet data to a target S3 or ADLS destination. The dataset retrieved by default is "wikitext" (the default value for the Dataset Name parameter). Failed S3 or ADLS write operations are retried automatically to handle transient issues. Define a KPI on the failure_WriteToS3/ADLS connection to monitor failed write operations.
This flow is not meant to run continuously and should be run once per dataset retrieved.
HuggingFace to S3/ADLS ReadyFlow details | |
---|---|
Source | HuggingFace Dataset |
Source Format | Parquet |
Destination | Cloudera Managed Amazon S3 or ADLS |
Destination Format | Parquet |