Managed storage access

Understanding how Cloudera Data Warehouse (CDW) stores data for multiple tenants and a high-level overview of the configuration tasks prepares you as DWAdmin to set up a RAZ-controlled warehouse.

The multitenant storage technique in CDW requires a separate Database Catalog plus at least one Virtual Warehouse per tenant.

You configure a dedicated tenant storage role to give only the tenant-specific default database catalog instance and its associated virtual warehouse data access to both of the following buckets.
  • The tenant storage location
  • Potentially shared Storage Location Base
  • The SDX Data Lake bucket

If the access token for a role is leaked, the data of others is not compromised; only one tenant’s own, and shared, data is vulnerable. A managed identity IDBroker approach to accessing the Virtual Warehouse is shown in the following graphic:

Using a Ranger Remote Authorization (RAZ)-enabled environment, you can control access to data based on user roles and classifications. As a CDP Admin, you can apply Ranger fine-grained access control policies to cloud storage.

The following diagram shows the high-level steps for configuring RAZ-enabled storage:

After you obtain the entitlements required for this feature CDW_ALLOW_MULTI_DEFAULT_DBC and CDW_STORAGE_ROLES, you can configure storage as shown in this diagram:

  • In CDP, register an environment that enables RAZ and uses the SDX Data Lake.
  • Activate the environment in CDW.
  • Manually create a tenant storage location(s), or use existing ones, for access by the tenant storage role.
  • In Azure, create an Azure managed identity for the tenant.
  • Assign the tenant specific Azure managed identity as a Storage Blob Data Owner role to the tenant-specific container; assign the tenant specific Azure managed identity as a Storage Blob Data Owner role to Storage Location Base.
  • In CDP Management Console, create an UMS group for this tenant.
  • Add the UMS machine user (created when you activated the CDW environment) to the UMS group.
  • Add the id broker mapping to the environment.
  • Sync user group changes for the RAZ-enabled environment with FreeIPA.
  • Create a separate Database Catalog with a unique managed identity created in the step above.
  • Create a tenant-specific Virtual Warehouse based on the Database Catalog, SDX Data Lake, and RAZ-enabled environment.
  • Create a tenant-specific Hive or Impala database that points to tenant storage locations.

    As the metadata is shared across all tenants, Ranger grants access to tenant data via a group at the database level.

  • For each tenant, repeat the actions above, starting from the step after activating the environment for CDW.

The managed identity accesses a tenant specific location and SDX Data Lake shared Storage Location Base. The Storage Location Base contains shared data, stores the Directed Acyclic Graph (DAG) data used by the Database Catalog, and provides integration with Cloudera Workload XM. WXM writes Impala query data to the shared Storage Location Base.

You need the UMS group to add the IDBroker mapping, as a single UMS machine user cannot have multiple ID broker mappings to different managed identities.

The following topics describe step-by-step how to set up your Database Catalog, and Virtual Warehouse for storing RAZ-enabled data.