Configuring access to Google Cloud Storage (GCS)

Google Cloud Storage (GCS) is not supported as a default file system, but access to data in GCS is possible via the gs connector. Use these steps to configure access from your cluster to GCS.

These steps assume that you are using an HDP version that supports the gs cloud storage connector (HDP 2.6.5 introduced this feature as a TP).

Prerequisites

Access to Google Cloud Storage is via a service account. The service account that you provide to Cloudbreak for GCS data access must have the following permissions:

Configure access to GCS

Access to Google Cloud Storage is via a service account.

Steps

  1. On the Cloud Storage page in the advanced cluster wizard view, select Use existing GSC storage.

  2. Under Service Account Email Address provide the email for your GCS storage account.

Once your cluster is in the running state, you should be able to access buckets that the configured storage account has access to.

Testing access to GCS

Test access to the Google Cloud Storage bucket by running a few commands from any cluster node. For example, you can use the command listed below (replace “mytestbucket” with the name of your bucket):

hadoop fs -ls gs://my-test-bucket/

Configure GCS storage locations

After configuring access to GCS via service account, you can optionally use a GCS bucket as a base storage location; this storage location is mainly for the Hive Warehouse Directory (used for storing the table data for managed tables).

Prerequisites

Steps

  1. When creating a cluster, on the Cloud Storage page in the advanced cluster wizard view, select Use existing GCS storage and select the existing service account, as described in Configure access to GCS.
  2. Under Storage Locations, enable Configure Storage Locations by clicking the On button.
  3. Provide your existing bucket name under Base Storage Location.
  4. Under Path for Hive Warehouse Directory property (hive.metastore.warehouse.dir), Cloudbreak automatically suggests a location within the bucket. For example, if the bucket that you specified is my-test-bucket then the suggested location will be my-test-bucket/apps/hive/warehouse. You may optionally update this path.

    Cloudbreak automatically creates this directory structure in your bucket.