ReadyFlow: S3 to S3 Avro

Learn about the S3 to S3 Avro ReadyFlow.

This ReadyFlow consumes JSON, CSV or Avro files from a source S3 location, converts the files into Avro and writes them to the destination S3 location. You can specify the source format, the source and target location as well as the schema to use for reading the source data.

S3 to S3 Avro ReadyFlow details
Source Amazon S3
Source Format JSON, CSV, Avro
Destination Amazon S3
Destination Format Avro
Table 1. S3 to S3 Avro ReadyFlow configuration parameters
Parameter Name Description Example
CDP Workload User Specify the CDP Machine User or workload User name that you want to authenticate to the target (managed) object store and to Schema Registry. Ensure that this user has the appropriate access rights to the object store location and to Schema Registry.
CDP Workload User Password Specify the password of the CDP machine user or workload user you are using to authenticate against the target (managed) object store and Schema Registry.
CDPEnvironment

DataFlow uses this parameter to auto-populate the Flow Deployment with Hadoop configuration files required to interact with S3.

DataFlow automatically adds all required configuration files to interact with Data Lake services. Unnecessary files that are added do not impact the deployment process.

CSV Delimiter If your source data is CSV, specify the delimiter here.
Data Input Format Specify the format of your input data. You can use "CSV", "JSON", or "AVRO" with this ReadyFlow.
Destination S3 Bucket

Specify the name of the destination (CDP managed) S3 bucket you want to write to.

The full path will be constructed out of:

s3a://#{Destination S3 Bucket}/#{Destination S3 Path}/${filename}

Destination S3 Bucket Region Specify the AWS region in which your bucket was created.

Supported values are:

  • us-gov-west-1

  • us-gov-east-1

  • us-east-1

  • us-east-2

  • us-west-1

  • us-west-2

  • eu-west-1

  • eu-west-2

  • eu-west-3

  • eu-central-1

  • eu-north-1

  • eu-south-1

  • ap-east-1

  • ap-south-1

  • ap-southeast-1

  • ap-southeast-2

  • ap-northeast-1

  • ap-northeast-2

  • ap-northeast-3

  • sa-east-1

  • cn-north-1

  • cn-northwest-1

  • ca-central-1

  • me-south-1

  • af-south-1

  • us-iso-east-1

  • us-isob-east-1

  • us-iso-west-1

Destination S3 Path

Specify the path within the destination (CDP managed) bucket where you want to write to without any leading characters.

The full path will be constructed out of:

s3a://#{Destination S3 Bucket}/#{Destination S3 Path}/${filename}

Schema Name Specify the schema name to be looked up in the Schema Registry used to parse the source files.
Schema Registry Hostname Specify the hostname of the Schema Registry you want to connect to. This must be the direct hostname of the Schema Registry itself, not the Knox Endpoint.
Source S3 Bucket Specify the name of the source (external) S3 bucket you want to read from.
Source S3 Bucket Region

Specify the AWS region in which your bucket was created.

Supported values are:

  • us-gov-west-1

  • us-gov-east-1

  • us-east-1

  • us-east-2

  • us-west-1

  • us-west-2

  • eu-west-1

  • eu-west-2

  • eu-west-3

  • eu-central-1

  • eu-north-1

  • eu-south-1

  • ap-east-1

  • ap-south-1

  • ap-southeast-1

  • ap-southeast-2

  • ap-northeast-1

  • ap-northeast-2

  • ap-northeast-3

  • sa-east-1

  • cn-north-1

  • cn-northwest-1

  • ca-central-1

  • me-south-1

  • af-south-1

  • us-iso-east-1

  • us-isob-east-1

  • us-iso-west-1

Source S3 Path Specify the path within the source (external) bucket where you want to read files from.