List of required configuration parameters for the HuggingFace to S3/ADLS ReadyFlow

When deploying the HuggingFace to S3/ADLS ReadyFlow, you have to provide the following parameters. Use the information you collected in Prerequisites.

Table 1. HuggingFace to S3/ADLS ReadyFlow configuration parameters
Parameter name Description
CDP Workload User Specify the CDP machine user or workload username that you want to use to authenticate to the object stores. Ensure this user has the appropriate access rights to the object store locations in Ranger or IDBroker.
CDP Workload User Password Specify the password of the CDP machine user or workload user you are using to authenticate against the object stores.
CDPEnvironment The CDP Environment configuration resources.
Dataset Name Specify the Dataset name. The default is "wikitext".
Destination S3 or ADLS Path Specify the name of the destination S3 or ADLS path you want to write to. Make sure that the path starts with "/" and that it does not end with "/".
Destination S3 or ADLS Storage Location Specify the name of the destination S3 bucket or ADLS Container you want to write to.

For S3, enter a value in the form: s3a://[Destination S3 Bucket]

For ADLS, enter a value in the form: abfs://[Destination ADLS File System]@[Destination ADLS Storage Account].dfs.core.windows.net