November 18, 2022
This release of Cloudera Data Warehouse (CDW) service on CDP Private Cloud Data Services introduces the new features and improvements that are described in this section.
Ability to use deterministic namespace in Kerberos principals
CDW now uses a deterministic namespace and environment IDs. The Kerberos principals for Database Catalogs and Environments use the service hostname and the deterministic namespace name based on the name of the Database Catalog or Environment.
When you specify an Environment or Database Catalog name, CDW appends a prefix to the environment and Database Catalog name, as well as to the Kerberos principal name. For more information, see Predefined Kerberos principals in Cloudera Data Warehouse Private Cloud.
Ability to install and manage Cloudera Data Warehouse clusters using CDP CLI
You can install CDP CLI on your computer and use it to install and manage CDW clusters on CDP Private Cloud. To install CDP CLI, see CLI client setup. For the list of CDW sub-commands, see https://cloudera.github.io/cdp-dev-docs/cli-docs/dw/index.html.
Real-time graphs in Grafana display scratch and cache disk utilization
You can monitor and track the scratch and cache disk utilization at a cluster level and the Virtual Warehouse level. This enables you to size a Virtual Warehouse optimally and calculate the memory and disk requirements. For more information, see Monitoring Data Warehouse service resources with Grafana dashboards.
Hue is now the unified next-generation SQL assistant in CDP
Hue packs the combined abilities of Data Analytics Studio (DAS) such as query optimization, query debugging framework, and rich query editor experience of Hue, making Hue the next-generation SQL assistant in CDP. You can search query history, view query details, visual explain plan, and DAG information, compare two queries, and download debug bundles for troubleshooting Hive queries from the Job Browser page. To support this feature, a new service called Query Processor is added to the CDP stack as a dependency for Hue.
A new tab called Hue query processor has been added under the section. The hue-query-processor.json and hue-event-processor.json files are no longer available under the drop-down menu. For more information, see About Hue Query Processor.
Data Analytics Studio (DAS) is deprecated
DAS is deprecated and is not installed by default. DAS will be unavailable in future releases. Cloudera encourages that you use Hue to run Hive LLAP workloads. If you need to use DAS, then you can enable it from the Advanced Settings page. See Enabling Data Analytics Studio in CDW Private Cloud.
Ability to configure Impala coordinator high availability
You can configure up to five Impala coordinators in an active-active configuration concurrently with cookie-based load balancing to resolve or mitigate query concurrency problems. To enable the active-active configuration, select the Enabled (Active-Active) option while creating a Virtual Warehouse. For more information, see Configuring Impala coordinator high availability in CDW Private Cloud.
Ability to spill Impala queries to HDFS
You can configure heavy Impala queries to write intermediate files during large sorts, joins, aggregations, or analytic function operations to a remote scratch space on HDFS. To enable this feature, you must configure the Impala daemon to use the specified locations for writing the intermediate files and then specify the HDFS URI while creating the Impala Virtual Warehouse. For more information, see Enabling Impala to spill to HDFS in CDW.
New Advanced Configurations menu for enabling and disabling deprecated and Technical Preview features
A new Advanced Configurations
menu has been added to the CDW web interface which
opens an Advanced Settings page. On this page, you can enable or
disable Technical Preview and deprecated features which are not installed or available out
of the box when you install the Private Cloud data services. For example, enabling DAS which
has been deprecated and enabling third-party S3 providers in private cloud.
Using the Refresh option to apply configuration changes
A new Refresh option has been added to the more options menu for Database Catalogs and Virtual Warehouses that helps you to apply configuration changes that you made at an environment level, from the Management Console, or from the Advanced Settings page. In most cases, this helps you to avoid deleting and recreating Database Catalogs or Virtual Warehouses. To learn more about the use cases in which you can use the Refresh option, see About the Refresh option.
Support added for AWS S3 and third-party object storage (Preview)
CDW supports using AWS S3 object storage services for storing tables. Other similar, compatible, on-premises object stores that support the S3 protocol can work as well. CDW exposes Hive and Impala tables stored on S3 as SQL tables which you can query using Hue. However, you cannot browse and import files to create tables from S3 in Hue. For more information, see Third-party object storage support for CDW Private Cloud.
This feature is in Technical Preview and not recommended for production deployments. Cloudera recommends that you try this feature in test or development environments.
Ability to change delegation username and password
You can update the delegation username and password that CDW uses to impersonate authorization requests from Hue to the Impala engine from the Environment Details page. For more information, see Changing delegation username and password.
Ability to configure Impala coordinator and executor pod size is GA
You can optimize the performance of your Impala Virtual Warehouse and resources used in an environment, based on your hardware configuration, by customizing the amount of resources allocated to the Impala coordinators, executors, and catalog daemons. This helps you to better leverage intra-query parallelism and achieve powerful compute clusters with fewer nodes.
This feature is generally available (GA) starting 1.4.1, and you can use it in production environments.
Earlier, CDW allowed you to specify the path and size for scratch and cache space for Impala executor and coordinator pods. Starting with CDW 1.4.1, you can only specify the size for these parameters.
A new parameter, Overhead size
has been added which allows you to specify storage
size for storing resources that are used by the tools run by the container. For more
information, see Creating custom pod configurations for Impala Virtual
Warehouses.
Data Visualization integration in Cloudera Data Warehouse is GA
CDW integrates Data Visualization for building graphic representations of data, dashboards, and visual applications based on CDW data, or other data sources you connect to. Authorized users can explore data using graphics such as pie charts and histograms, and collaborate using dashboards. BI analysts who can access your environment can use these features. To get started with Cloudera Data Visualization, see Creating a Data Visualization instance in CDW
This feature is generally available (GA) starting 1.4.1, and you can use it in production environments.
SSL support for Oracle database is GA
CDW supports Oracle database and can connect to SSL-enabled Oracle on the base cluster. For optimum security, the network connection between the default Database Catalog Hive MetaStore (HMS) in CDW and the relational database hosting the base cluster’s HMS must be encrypted with SSL. For more information, see Configuring Oracle database to use SSL for Data Warehouse.
This feature is generally available (GA) starting 1.4.1, and you can use it in production environments.
Improved read performance of ORC tables by Impala
Continuous improvements in Impala's read performance of ORC tables.
Ozone filesystem support added for Hive and Impala (Preview)
You can use Apache Ozone storage with CDW Private Cloud. This feature is in Technical Preview and Cloudera recommends that you try this in test or development environments. For more information, see Using Ozone storage with Cloudera Data Warehouse Private Cloud.