A Virtual Warehouse is an instance of compute resources in on cloud that is equivalent to an on-prem cluster. You learn how to
create a new Virtual Warehouse in Cloudera Data Warehouseon cloud.
A Virtual Warehouse provides access to the data in tables and views in the data lake your
Database Catalog uses. A Virtual Warehouse can access only the Database Catalog you select
during creation of the Virtual Warehouse.
In this task and subtasks, you configure Virtual Warehouse features, including
performance-related features for production workloads, such as the Virtual Warehouse size
and auto-scaling. These features are designed to manage huge workloads in production, so if
you are evaluating Cloudera Data Warehouse, or just learning, simply accept the
default values. This task covers the bare minimum configurations.
You obtained permissions to access a running environment for creating a Virtual
Warehouse.
You obtained the DWAdmin role to perform Cloudera Data Warehouse tasks.
You logged into the Cloudera web interface.
Your activated the environment from Cloudera Data Warehouse.
Navigate to Cloudera Data Warehouse Overview
page, click the Virtual Warehouses tab and then click
Neew Virtual Warehouse.
In the Create Virtual Warehouse drawer, specify a
Name for your Virtual Warehouse.
Select the Type (Hive, Impala, or Trino) of Virtual Warehouse.
Virtual Warehouses can use Hive or Impala as the underlying SQL execution engine.
Typically, Hive is used to support complex reports and enterprise dashboards, Impala is
used to support interactive, ad-hoc analysis, and Trino is used to query large data sets
distributed over one or more heterogenous data sources.
Select the Environment and Database
Catalog for this Virtual Warehouse.
In AWS environments only, accept the default availability zone, or select an
availability zone, such as "us-east-1c".
The default behavior is to randomly select an availability zone from the list of
configured availability zones for the associated environment. Generally, it is fine to
accept the default. All compute resources will run in the selected zone.
Select the Compute Instance Types for the Virtual Warehouse
based on your workload.
For more information, see Supported Compute Instance Types.
Select the Image Version, and the Data Explorer Image
version you want to use, or accept the default version (latest) at the top
of the drop-down menus.
Select the Size of your Virtual Warehouse.
For Hive and Impala Virtual Warehouses, in the Authentication
pane, select Enable SSO to enable single sign-on to your Virtual Warehouse. If you
do not have a user group set up for SSO, do not enable SSO.
You can set up a user group in Management Console > User Management that is required for enabling SSO and identifies the users authorized to
access to this Virtual Warehouse.
In the Virtual Warehouse specific settings, configure auto-suspend, auto-scaling, and
query isolation, based on the type of Virtual Warehouse selected.
In the User Groups and Tagging pane, select the user groups that
you want to access the Virtual Warehouse.
Optional: Enter keys and values for Tagging the Virtual Warehouse.
Accept default values for other settings, or change the values to suit your use case,
and click Create Virtual Warehouse to create the new Virtual
Warehouse.
Click the tooltip
for information about settings.
When you create a Virtual Warehouse, a cluster is created in your cloud provider
account. This cluster has two buckets. One bucket is used for managed data and the other
is used for external data.