Supported object storage services for Cloudera Data Warehouse Private Cloud

HDFS is the default storage system for Cloudera Data Warehouse (CDW). However, you can enable CDW to access object storage such as AWS S3 and Azure Data Lake Storage (ADLS Gen1 and Gen2) if the CDP Private Cloud base cluster is configured to access it. You can query Hive and Impala tables stored on object stores using Hue.

When you activate an environment in CDW, all the hadoop configurations variables (fs.s3a.*/fs.azure.*) are copied from the core-site.xml file present on the base cluster to the hadoop-core-site.xml file of the Hive and Impala metastore pods, enabling CDW to establish a connection to S3/ADLS.

Following are the key configurations that must be present in the base cluster core-site.xml file for connecting to S3 or S3-compatible storage providers:
  • fs.s3a.access.key
  • fs.s3a.secret.key
  • fs.s3a.endpoint
  • fs.s3a.connection.ssl.enabled
Following are the key configurations that must be present in the base cluster core-site.xml file for connecting to ADLS storage provider:
  • fs.azure.account.oauth.provider.type
  • fs.azure.account.oauth2.client.id
  • fs.azure.account.oauth2.client.secret
  • fs.azure.account.oauth2.client.endpoint

The fs.s3a.*/fs.azure configurations are read-only. You can view these configurations from the CONFIGURATION tab on the Database Catalog and Virtual Warehouse details page by selecting the hadoop-core-site.xml option from the Configuration files drop-down menu.