Third-party object storage support for Cloudera Data Warehouse Private Cloud

Cloudera Data Warehouse (CDW) can access object storage such as AWS S3 if the CDP Private Cloud base cluster is configured to connect to the object store. You can query Hive and Impala tables stored on object stores using Hue.

By default, when you activate an environment in CDW, all the hadoop.fs.s3a configurations (fs.s3a.*) are copied from the core-site.xml file present on the base cluster to the hadoop-core-site.xml file of the Hive and Impala metastore pods, enabling CDW to establish a connection to S3. The following four are the key configurations that must be present in the base cluster core-site.xml file:
  • fs.s3a.access.key
  • fs.s3a.secret.key
  • fs.s3a.endpoint
  • fs.s3a.connection.ssl.enabled

The fs.s3a.* configurations are read-only. You can view the fs.s3a.* configurations from the CONFIGURATION tab on the Database Catalog and Virtual Warehouse details page by selecting the hadoop-core-site.xml option from the Configuration files drop-down menu.

The Third-party S3 providers in private cloud option is enabled by default. You can disable CDW’s access to S3 by deselecting the Third-party S3 providers in private cloud option from Advanced Configuration > Advanced Settings page.