Configuration workflow

Configure secure S3A access for Apache Ozone in Cloudera Data Warehouse.

  • A platform administrator must complete the credential registration process by following the step-by-step instructions in the Managing S3-Compatible Credentials.
  • The S3 Bucket Name specified during credential creation precisely must match the bucket name used in your SQL queries, such as s3a://<bucket>/. Cloudera Data Warehouse uses this exact string to construct and inject the required Hadoop configuration property keys.
  1. Automated credential delivery and configuration: Whenever you create or update a Database Catalog or Virtual Warehouse, the Cloudera Data Warehouse automatically delivers and applies the credentials. Manual configuration is not required. The system automatically performs the following actions during this automated configuration process:
    1. Cloudera Data Warehouse contacts the environment service to get the list of your configured Ozone S3 accounts.
    2. The system securely reads the required access keys and secret keys directly from the HashiCorp Vault.
    3. Cloudera Manager automatically builds a secure JCEKS keystore and applies the following credential properties to your Virtual Warehouse pods:
      • fs.s3a.bucket.<bucket>.access.key
      • fs.s3a.bucket.<bucket>.secret.key
    4. Cloudera Manager updates the core-site.xml file for all relevant query engines (Hive Metastore, HiveServer2, Impala, and Trino) with these essential routing properties:
      • Endpoint URL: fs.s3a.bucket.<bucket>.endpoint (Specifies S3 gateway URL)
      • Path-style access: fs.s3a.bucket.<bucket>.path.style.access = true (Required for Apache Ozone routing)
      • Region: fs.s3a.bucket.<bucket>.endpoint.region (Applied only if a region is specified)
  2. Virtual Warehouse data access execution: After the Virtual Warehouse pods are deployed, you can access and query your data immediately. The analytic query engines gain immediate, transparent access to the Apache Ozone storage layer the moment the pods are deployed.
    1. Run queries directly against your data locations using standard S3A URI formats, such as s3a://<bucket>/path/to/data.

    2. Run your queries through Hive, Impala, or Trino without any additional setup. Because credentials are pre-configured within the pods, you do not need to apply additional settings, session-level parameters, or manual configuration adjustments.