Key features of Cloudera Data Warehouse Public Cloud

The Cloudera Data Warehouse (CDW) service provides data warehouses that can be automatically configured and isolated. CDW optimizes existing workloads when you move to the cloud. You scale resources up and down to meet your business demands, and save costs by suspending and resuming resources automatically. Data warehouses comply with your Data Lake security requirements.

Automatically configured and isolated

Each data warehouse can be automatically configured for you by the CDW service, but you can adjust certain settings to suit your needs. Individual warehouses are completely isolated, ensuring that the users have access to only their data and eliminating the problem of "noisy neighbors." Noisy neighbors are workloads that monopolize system resources and interfere with the queries from other tenants. With Cloudera Data Warehouse, you can easily offload noisy neighbor workloads to their own Virtual Warehouse instance so other tenants have access to enough compute resources for their workloads to complete and meet their SLAs.

This capability to isolate individual warehouses is equally useful for "VIP workloads." VIP workloads are crucial workloads that must have resources to complete immediately and as quickly as possible without waiting in a queue. You can run these VIP workloads in their own warehouse to ensure they get the resources they need to complete as soon as possible.

Optimized for your workloads

Data warehouses are automatically optimized for your workloads. This includes pre-configuring the software and creating the different caching layers, which means that you need not engage in complex capacity planning or tuning. You create a Virtual Warehouse that specifies a SQL engine:

  • Hive for data warehouses that support complex reports and enterprise dashboards.
  • Impala for Data Marts that support interactive, ad-hoc analysis.

You choose the Virtual Warehouse instance size and adjust thresholds for auto-scaling.

Auto-scaling

Auto-scaling enables both scaling up and scaling down of Virtual Warehouse instances so they can meet your varying workload demands and save costs on cloud resources when they are not needed.

Auto-scaling provides the following benefits:

  • Service availability: Clusters are ready to accept queries 24 x 7.
  • Auto-scaling based on query wait-time: Queries start executing within the number of seconds that you specify and cluster resources are added or shut down to meet demand.
  • Auto-scaling based on number of concurrent queries running on the system: "Infinite scaling" means that the number of concurrent queries can go from 10 to 100 in minutes.
  • Cost guarantee: You can configure auto-scaling upper limits, which determine how large a compute cluster can grow. Since compute costs increase as cluster size increases, having a way to configure upper limits gives administrators a method to stay within a budget.

Auto-suspend and resume

You have the capability to set an AutoSuspend Timeout when creating a Virtual Warehouse. This sets the maximum time a Virtual Warehouse idles before shutting down. For example, if you set this to 60 seconds, then if the Virtual Warehouse is idle for 60 seconds, it suspends itself so you do not have to pay for unused compute resources. The first time a new query is run against an auto-suspend Virtual Warehouse, it restarts. This feature helps you maintain a tight control on your cloud spend while ensuring availability to run your workloads.

Security compliance

Your Database Catalogs and Virtual Warehouses automatically inherit the same security restrictions that are applicable to your CDP environment. There is no need to specify the security setup again for each Database Catalog or Virtual Warehouse. For more information, see CDP identity management. It discusses integration with Apache Knox and your LDAP provider which uses FreeIPA Identity Management.

The following security controls are inherited from your CDP environment:

  • Authentication: Ensures that all users have proven their identity before accessing the Cloudera Data Warehouse service or any created Database Catalogs or Virtual Warehouses.
  • Authorization: Ensures that only users who have been granted adequate permissions are able to access the Cloudera Data Warehouse service and the data stored in the tables.
  • Dynamic column masking: If rules are set up to mask certain columns when queries run, based on the user executing the query, then these rules also apply to queries executed in the Virtual Warehouses.
  • Row-level filtering: If rules are set up to filter certain rows from being returned in the query results, based on the user executing the query, then these same rules also apply to queries executed in the Virtual Warehouses.

Tenant isolation

The multitenant storage technique in CDW offers increased security over the storage method used in earlier releases. The earlier releases based all storage access in CDW on a single EC2 instance role. Tenant isolation offers users independence on several levels:
  • Isolation reduces contention
  • Workloads do not interfere with each other
  • Each tenant can choose a version of an independent deployment
  • Independent upgrades limit the scope of changes