Prerequisites

Learn how to collect the information you need to deploy the Dropbox to S3/ADLS ReadyFlow, and meet other prerequisites.

For your data ingest source

  • You have a Dropbox account. If you do not have one, you can sign up for a free account here.
  • You have data stored in Dropbox that you want to move to an object store.
  • You have created a Dropbox App.
  • You have generated the following Dropbox credentials: App Key, App Secret, Access Token, and Refresh Token
    Generating App Key and App Secret
    1. Log in with your Dropbox account.
    2. If you already have an app, go to Dropbox Developers.
    3. Click App Console and select your app.

      On the app's info page you can find the App key and App secret credentials.

    4. If you do not have any apps, go to Dropbox Developers and click Create apps.
    5. On the next page select Scoped access and Full Dropbox as access type.
    6. Provide a name for your app.
    7. Accept the Dropbox API Terms and Conditions and click Create app.

    Your app is created and you are taken to the app's info page. You can find the App key and App secret credentials on the Settings tab.

    Setting required permissions for your app

    1. Go to the Permissions tab of your app’s info page.
    2. Set the required permission for your app by enabling files.content.read.

      With this permission, your app will be able to read the files in Dropbox.

    3. Click Submit.

      For more information, see Generating Access Token and Refresh Token.

    Generating Access Token and Refresh Token
    1. Go to the following web page: https://www.dropbox.com/oauth2/authorize?token_access_type=offline&response_type=code&client_id=your_app_key
    2. Click Next and then click Allow on the next page.

      An access code is generated for you, and it is displayed on the next page.

    3. Execute the following command from the terminal to fetch the access and refresh tokens.curl https://api.dropbox.com/oauth2/token -d code=your_generated_access_code -d grant_type=authorization_code -u your_app_key:your_app_secret
      The curl command results in a json file that contains the access_token and the refresh_token.
       {
       "access_token": "sl.xxxxxxxxxxx"
       "expires_in": 14400,
       "refresh_token": "xxxxxx",
       "scope": "files.content.read files.metadata.read",
       "uid": "xxxxxx",
       "account_id": "dbid:xxxx"
       }
      

      For more information, see the Dropbox Getting Started.

For DataFlow

  • You have enabled DataFlow for an environment.

    For information on how to enable DataFlow for an environment, see Enabling DataFlow for an Environment.

  • You have created a Machine User to use as the CDP Workload User.

  • You have given the CDP Workload User the EnvironmentUser role.
    1. From the Management Console, go to the environment for which DataFlow is enabled.
    2. From the Actions drop down, click Manage Access.
    3. Identify the user you want to use as a Workload User.
    4. Give that user EnvironmentUser role.
  • You have synchronized your user to the CDP Public Cloud environment that you enabled for DataFlow.

    For information on how to synchronize your user to FreeIPA, see Performing User Sync.

  • You have granted your CDP user the DFCatalogAdmin and DFFlowAdmin roles to enable your user to add the ReadyFlow to the Catalog and deploy the flow definition.
    1. Give a user permission to add the ReadyFlow to the Catalog.
      1. From the Management Console, click User Management.
      2. Enter the name of the user or group you wish to authorize in the Search field.
      3. Select the user or group from the list that displays.
      4. Click Roles > Update Roles.
      5. From Update Roles, select DFCatalogAdmin and click Update.
    2. Give your user or group permission to deploy flow definitions.
      1. From the Management Console, click Environments to display the Environment List page.
      2. Select the environment to which you want your user or group to deploy flow definitions.
      3. Click Actions > Manage Access to display the Environment Access page.
      4. Enter the name of your user or group you wish to authorize in the Search field.
      5. Select your user or group and click Update Roles.
      6. Select DFFlowAdmin from the list of roles.
      7. Click Update Roles.
    3. Give your user or group access to the Project where the ReadyFlow will be deployed.
      1. Go to DataFlow > Projects.
      2. Select the project where you want to manage access rights and click More > Manage Access.
    4. Start typing the name of the user or group you want to add and select them from the list.
    5. Select the Resource Roles you want to grant.
    6. Click Update Roles.
    7. Click Synchronize Users.

For your ADLS data ingest target

  • You have your ADLS container and path into which you want to ingest data.
  • You have performed one of the following to configure access to your ADLS folder:
    • You have configured access to the ADLS folders with a RAZ enabled environment.

      It is a best practice to enable RAZ to control access to your object store folders. This allows you to use your CDP credentials to access ADLS folders, increases auditability, and makes object store data ingest workflows portable across cloud providers.
      1. Ensure that Fine-grained access control is enabled for your DataFlow environment.
      2. From the Ranger UI, navigate to the ADLS repository.
      3. Create a policy to govern access to the ADLS container and path used in your ingest workflow. For example: adls-to-adls-avro-ingest
      4. Add the machine user that you have created for your ingest workflow to ingest the policy you just created.
      For more information, see Ranger policies for RAZ-enabled Azure environment.
    • You have configured access to ADLS folders using ID Broker mapping.

      If your environment is not RAZ-enabled, you can configure access to ADLS folders using ID Broker mapping.
      1. Access IDBroker mappings.
        1. To access IDBroker mappings in your environment, click Actions > Manage Access.
        2. Choose the IDBroker Mappings tab where you can provide mappings for users or groups and click Edit.
      2. Add your CDP Workload User and the corresponding Azure role that provides write access to your folder in ADLS to the Current Mappings section by clicking the blue + sign.
      3. Click Save and Sync.

For your S3 data ingest target

  • You have your source S3 path and bucket.

  • Perform one of the following to configure access to S3 buckets:
    • You have configured access to S3 buckets with a RAZ enabled environment.

      It is a best practice to enable RAZ to control access to your object store buckets. This allows you to use your CDP credentials to access S3 buckets, increases auditability, and makes object store data ingest workflows portable across cloud providers.
      1. Ensure that Fine-grained access control is enabled for your DataFlow environment.
      2. From the Ranger UI, navigate to the S3 repository.
      3. Create a policy to govern access to the S3 bucket and path used in your ingest workflow.
      4. Add the machine user that you have created for your ingest workflow to the policy you just created.

      For more information, see Creating Ranger policy to use in RAZ-enabled AWS environment.

    • You have configured access to S3 buckets using ID Broker mapping.

      If your environment is not RAZ-enabled, you can configure access to S3 buckets using ID Broker mapping.
      1. Access IDBroker mappings.
        1. To access IDBroker mappings in your environment, click Actions > Manage Access.
        2. Choose the IDBroker Mappings tab where you can provide mappings for users or groups and click Edit.
      2. Add your CDP Workload User and the corresponding AWS role that provides write access to your folder in your S3 bucket to the Current Mappings section by clicking the blue + sign.
      3. Click Save and Sync.