Configuring Cloud Data Access
Also available as:
PDF

Configure S3 storage locations

After configuring access to S3 via instance profile, you can optionally use an S3 bucket as a base storage location; this storage location is mainly for the Hive Warehouse Directory (used for storing the table data for managed tables).

Prerequisites

  • You must have an existing bucket. For instructions on how to create a bucket on S3, refer to AWS documentation.
  • The instance profile that you configured must allow access to the bucket.

Steps

  1. When creating a cluster, on the Cloud Storage page in the advanced cluster wizard view, select Use existing instance profile and select the instance profile to use, as described in the documentation for configuring access to S3.
  2. Under Storage Locations, enable Configure Storage Locations by clicking the button.
  3. Provide your existing bucket name under Base Storage Location.
    Note
    Note

    Make sure that the bucket already exists within the account.

  4. Under Path for Hive Warehouse Directory property (hive.metastore.warehouse.dir), Cloudbreak automatically suggests a location within the bucket. You may optionally update this path or select Do not configure.
    Note
    Note

    This directory structure will be created in your specified bucket upon the first activity in Hive.