June 26, 2020
This release of the Cloudera Data Warehouse service on CDP Public Cloud introduces the following new features and improvements:
Query isolation for scan-heavy, data-intensive queries in Hive LLAP Virtual Warehouses
Hive Virtual Warehouses base auto-scaling on the total scan size of the query. HiveServer,
which receives all incoming queries, has a query planner component. When the HiveServer query
planner receives queries, it examines the total scan size of each query. That is, it looks at
the number of bytes read from the file system required to run the query. If the
Query Isolation feature has been enabled for a Virtual Warehouse and a query scans
more data than the threshold set in the
hive.query.isolation.scan.size.threshold
parameter, the planner runs the
query in isolation. This means that an isolated standalone executor group is spawned to run
the data-intensive query. For more details, see Hive query isolation for scan-heavy, data-intensive
queries.
Overlay network support for AWS environments
An overlay network is a software-defined layer of network abstraction that is used to run multiple separate, discrete virtualized network layers over the AWS VPC network. In the case of the CDW service, a custom CNI (Container Network Interface) plugin is used to enable the overlay network. It creates two network spaces:
- A node network space, which derives per-node IP addresses from the VPC.
- A Kubernetes pod network space, which derives per-pod IP addresses from the CNI plugin's own network space.
The overlay network is bridged into the node network. As a result, one IP address is required per node instead of one IP address needed per pod. Consequently, there are more available IP addresses and you can use the CDW service efficiently, auto-scaling Virtual Warehouses as needed to meet the demands of your workloads. For more information, see Use overlay networks for AWS environments in Cloudera Data Warehouse service and Enable overlay networks in AWS environments.