List of required configuration parameters for the S3 to Databricks ReadyFlow

When deploying the S3 to Databricks ReadyFlow, you have to provide the following parameters. Use the information you collected in Prerequisites.

Table 1. S3 to Databricks ReadyFlow configuration parameters
Parameter Name Description
CDP Workload User Specify the CDP machine user or workload user name that you want to use to authenticate to the object stores and to the schema registry. Ensure this user has the appropriate access rights to the object store locations and to the schema registry in Ranger or IDBroker.
CDP Workload User Password Specify the password of the CDP machine user or workload user you are using to authenticate against the object stores and the schema registry.
CDPEnvironment The CDP Environment configuration resources.
Destination S3 Bucket Specify the name of the destination S3 bucket you want to write to. The full path will be constructed out of s3a://#{Destination S3 Bucket}/#{Destination S3 Path}
Destination S3 Path Specify the path within the destination bucket where you want to write to. Make sure that the path starts with "/". The path must end with the destination Databricks Table Id. The full path is constructed out of s3a://#{Destination S3 Bucket}/#{Destination S3 Path}
Partition Column Specify the name of the column used to partition your destination Databricks table. This ReadyFlow only supports a single partition column.
Partition Column Exists Specify whether the destination Databricks column is partitioned. The default value is YES.
Schema Name Specify the schema name to be looked up in the Schema Registry used to parse the source files.
Schema Name 2 If your Databricks table is partitioned, specify the name of the modified schema to be looked up in the Schema Registry. This schema should not include the partition column field.
Schema Registry Hostname Specify the hostname of the Schema Registry you want to connect to. This must be the direct hostname of the Schema Registry itself, not the Knox Endpoint.
Source S3 Bucket Specify the name of the source S3 bucket you want to read from.
Source S3 Path Specify the path within the source bucket where you want to read files from.