June 21, 2022

This release of Cloudera Data Warehouse (CDW) service on CDP Private Cloud Data Services introduces the new features and improvements that are described in this section.

Query isolation for scan-heavy, data-intensive queries in Hive LLAP Virtual Warehouses

Hive Virtual Warehouses base auto-scaling on the total scan size of the query. HiveServer, which receives all incoming queries, has a query planner component. When the HiveServer query planner receives queries, it examines the total scan size of each query. That is, it looks at the number of bytes read from the file system required to execute the query. If the Query Isolation feature has been enabled for a Virtual Warehouse and a query scans more data than the threshold set in the hive.query.isolation.scan.size.threshold parameter, the planner runs the query in isolation. This means that an isolated standalone executor group is spawned to run the data-intensive query. For more information, see Hive query isolation for ETL jobs.

Generate and download Hive diagnostic bundles

You can now generate and download diagnostic bundles containing log files for Hive Virtual Warehouses. For more information, see Downloading Hive diagnostic bundles in Data Warehouse Private Cloud.

SSL support added for MariaDB and MySQL databases

CDW can connect to SSL-enabled MariaDB and MySQL databases on the base cluster in addition to the SSL-enabled PostgreSQL database. For optimum security, the network connection between the default Database Catalog Hive MetaStore (HMS) in CDW and the relational database hosting the base cluster’s HMS must be encrypted with SSL. See How to enable SSL support for MariaDB, MySQL, and Oracle databases.

SSL support added for Oracle databases (Preview)

CDW can connect to SSL-enabled Oracle database on the base cluster. See How to enable SSL support for MariaDB, MySQL, and Oracle databases.

Ability to configure Impala coordinator and executor pod size (Preview)

You can optimize the performance of your Impala Virtual Warehouse and resources used in an environment based on your hardware configuration by customizing the amount of resources allocated to the Impala coordinators, executors, and catalog daemons. This helps you to better leverage intra-query parallelism and achieve powerful compute clusters with fewer nodes. For more information, see Creating custom pod configurations for Impala Virtual Warehouses.

Node-level monitoring capabilities on ECS (Preview)

You can now monitor node-level metrics using Grafana. The metrics include CPU usage, memory usage, network usage, and disk IO. For more information, see Monitoring Data Warehouse service resources with Grafana dashboards.

Auto-shutdown Impala coordinators

When you create a Virtual Warehouse, you can configure Impala coordinators to automatically shutdown during idle periods. You can also set a delay before the coordinator shuts down. For more information, see Configuring Impala coordinator shutdown.

Data Visualization integration in Cloudera Data Warehouse (Preview)

CDW integrates Data Visualization for building graphic representations of data, dashboards, and visual applications based on CDW data, or other data sources you connect to. Authorized users can explore data using graphics such as, pie charts and histograms and collaborate using dashboards. BI analysts who can access your environment can use these features. See Creating a Data Visualization instance in CDW.

Impala Debug Web UIs are available in CDW Private Cloud

In CDW Private Cloud, you can now use the Impala debug web UIs, which map to equivalent debug web UIs in Cloudera Manager as follows:
CDW Debug Web UI Cloudera Manager Equivalent
Impala Catalog Web UI Catalog Server Web UI
Impala Coordinator Web UI Impala Daemon Web UI
Impala StateStore Web UI StateStore Web UI

For more information about this feature, see Using the Web UI to debug Impala Virtual Warehouses.

Hue supports Hive Hybrid Procedural SQL in CDW

You can run Hive Hybrid Procedural SQL (HPL/SQL) using the Hue query editor in CDW. For more information, see How to run a stored procedure from Hue in Cloudera Data Warehouse.

Improved read performance of ORC tables by Impala

Continuous improvements in Impala's read performance of ORC tables.

Security improvements

  • CVE-2021-44228 (Apache Log4j 2 vulnerability) has been addressed in CDW on CDP Private Cloud 1.4.0 by upgrading Apache Log4j 2 to version 2.17.1.
  • CDW Docker containers, applications, and modules now run as non-root users, including Hue.