June 26, 2020

This release of the Cloudera Data Warehouse service on CDP Public Cloud introduces the following new features and improvements:

Query isolation for scan-heavy, data-intensive queries in Hive LLAP Virtual Warehouses

Hive Virtual Warehouses base auto-scaling on the total scan size of the query. HiveServer, which receives all incoming queries, has a query planner component. When the HiveServer query planner receives queries, it examines the total scan size of each query. That is, it looks at the number of bytes read from the file system required to run the query. If the Query Isolation feature has been enabled for a Virtual Warehouse and a query scans more data than the threshold set in the hive.query.isolation.scan.size.threshold parameter, the planner runs the query in isolation. This means that an isolated standalone executor group is spawned to run the data-intensive query. For more details, see Hive query isolation for scan-heavy, data-intensive queries.

Overlay network support for AWS environments

An overlay network is a software-defined layer of network abstraction that is used to run multiple separate, discrete virtualized network layers over the AWS VPC network. In the case of the CDW service, a custom CNI (Container Network Interface) plugin is used to enable the overlay network. It creates two network spaces:

  • A node network space, which derives per-node IP addresses from the VPC.
  • A Kubernetes pod network space, which derives per-pod IP addresses from the CNI plugin's own network space.

The overlay network is bridged into the node network. As a result, one IP address is required per node instead of one IP address needed per pod. Consequently, there are more available IP addresses and you can use the CDW service efficiently, auto-scaling Virtual Warehouses as needed to meet the demands of your workloads. For more information, see Use overlay networks for AWS environments in Cloudera Data Warehouse service and Enable overlay networks in AWS environments.