Supported object storage services for Cloudera Data Warehouse on premises

HDFS is the default storage system for Cloudera Data Warehouse. However, you can enable Cloudera Data Warehouse to access object storage such as AWS S3 and Azure Data Lake Storage (ADLS Gen1 and Gen2) if the Cloudera Base on premises cluster is configured to access it. You can query Hive and Impala tables stored on object stores using Hue.

When you activate an environment in Cloudera Data Warehouse, all the hadoop configurations variables (fs.s3a.*/fs.azure.*) are copied from the core-site.xml file present on the base cluster to the hadoop-core-site.xml file of the Hive and Impala metastore pods, enabling Cloudera Data Warehouse to establish a connection to S3/ADLS.

Following are the key configurations that must be present in the base cluster core-site.xml file for connecting to S3 or S3-compatible storage providers:
  • fs.s3a.access.key
  • fs.s3a.secret.key
  • fs.s3a.endpoint
  • fs.s3a.connection.ssl.enabled
Following are the key configurations that must be present in the base cluster core-site.xml file for connecting to ADLS storage provider:
  • fs.azure.account.oauth.provider.type
  • fs.azure.account.oauth2.client.id
  • fs.azure.account.oauth2.client.secret
  • fs.azure.account.oauth2.client.endpoint

The fs.s3a.*/fs.azure configurations are read-only. You can view these configurations from the CONFIGURATION tab on the Database Catalog and Virtual Warehouse details page by selecting the hadoop-core-site.xml option from the Configuration files drop-down menu.