What's new in Cloudera Data Warehouse Private Cloud

Learn about the new features in Cloudera Data Warehouse (CDW) service on CDP Private Cloud Data Services 1.5.2.

CDW simultaneously supports LDAP and Kerberos authentication for Hive Virtual Warehouses

Earlier, you could select either LDAP or Kerberos for authenticating users to Hive Virtual Warehouses while creating a new Virtual Warehouse. Starting with 1.5.2, CDW simultaneously supports both LDAP and Kerberos authentication mechanisms. See Authenticating users in CDW Private Cloud.

Group-level access control for Hive Virtual Warehouse (Preview)

CDW enables you to allow one or more user groups to access a particular Hive Virtual Warehouse, similar to Impala. As a result, only specific users can connect to a Virtual Warehouse, from all supported channels (Hue, Beeline, JDBC, or other Business Intelligence tools). The option to specify user groups when creating a new Virtual Warehouse is disabled by default. See Enabling warehouse-level access control for Hive and Impala in CDW Private Cloud.

Ability to use quota-managed resource pools

You can now assign quota-managed resource pools for environments, Data Catalogs, Virtual Warehouses, and Data Visualization instances in CDW. To learn more about quota management in CDP Private Cloud Data Services, see Managing cluster resources using Quota Management. To enable the use of quota management for CDW entities, see Enabling quota management in CDW Private Cloud.

About Iceberg support in CDW Private Cloud Data Services 1.5.2

Apache Iceberg is generally available (GA) for Impala and Hive LLAP in CDW Private Cloud when deployed on CDP Private Cloud Base version 7.1.9 or higher. This applies to CDW LLAP and Impala, and CDP Private Cloud Base with Spark 3, Impala, Nifi, and Flink. See Apache Iceberg in Cloudera Data Platform.

Apache Iceberg is in Technical Preview for Impala and Hive LLAP in CDW Private Cloud when deployed on CDP Private Cloud Base version 7.1.7 SP2 or 7.1.8, because interoperability requirements between CDP Private Cloud Base and CDW Private Cloud are not met. Tables that are converted to Iceberg Table format can only be accessed through CDW Impala and LLAP.

Added support for collecting CDW audit logs

Audit logging is now supported for CDW. In Private Cloud, audit logs from the Control Plane service and CDW are sent to the OpenTelemetry (OTEL) collector. The audit events do not persist. You can configure the OTEL collector to send data to external systems such IBM Guardian by using the syslog OTEL exporter. The OTEL collector is located in the Control Plane namespace, which is a pod with the following name format:
cdp-release-opentelemetry-collector-<UNIQUE-ID>

You can use the event data for troubleshooting issues. For more information, see Auditing Control Plane activity. For the list of the audit events collected for CDW, see Cloudera Data Warehouse audit events.

Unified timezone on the CDP Private Cloud Base and Data Services clusters

CDW logs, diagnostic bundles, and time-related SQL queries can now use the timezone specified in Cloudera Manager. On OCP, you must manually install the third-party webhook called K8TZ helm chart before the CDP Private Cloud Containerized Cluster setup step. For more information, see ECS unified time zone.

Added support for using dedicated worker nodes

On a CDP Private Cloud Data Services cluster, you can taint the nodes having the specialized hardware for specific workloads, such as CDW or Cloudera Machine Learning (CML). For example, CML requires higher GPU power and CDW requires more local storage (either SSD or NVME). If you have tainted the worker nodes in Cloudera Manager, then you can enable the Use dedicated nodes for executor option while activating an environment in CDW or by editing an existing environment. CDW and other data services can then schedule the Hive and Impala executor and coordinator pods on these dedicated worker nodes tainted exclusively for CDW. Other pods such as Data Visualization or Hive MetaStore (HMS) can only be scheduled on any of the undedicated nodes available within the cluster. For more information, see Scheduling executor pods on dedicated worker nodes in CDW Private Cloud.

Ability to view Impala query details and query profile in Hue

You can now view Impala query details, query plan, execution summary, and query metrics on the new Impala Queries tab on the Job Browser page using Hue in CDW, and use this information to tune and optimize your queries. You can also compare two queries and view all query details side-by-side. You can also view the Impala query profile and other related details on the new Impala tab on the Job Browser page. See Viewing Impala query details.

CDW is supported on a multiple-base-cluster deployment

You can configure one Embedded Container Service (ECS) cluster to work with multiple CDP Private Cloud Base clusters managed by separate instances of Cloudera Manager. CDP Private Cloud Data Services 1.5.2 onward, CDW is supported in this deployment configuration. For more information, see Configuring multiple Base clusters with one ECS cluster.

Improvements

Improved pod placement policy
When scheduling HiveServer2 (HS2), executors, and coordinators in Impala and Hive Virtual Warehouses, CDW now considers rack configurations specified in Cloudera Manager. If sufficient resources are available, CDW prefers the same racks for scheduling HS2, executor, and coordinator pods. For more information, see Pod placement policy and rack awareness in CDW Private Cloud.
Improved third-party integration with CDW on Private Cloud
Cloudera has improved the tagging and labeling of CDW entities so that the third-party applications can write Kubernetes injectors to add custom services to CDW. This allows the add-on services built on top of CDW to work in tandem without needing to modify the CDW Docker images. To integrate your services with CDW, these services must provide their own Helm charts and by using node labels, these services can run as sidecar containers to the CDW pods. For the list of available labels, see List of labels for third-party integration.
CLI client drivers are packaged with CDW
Previously, the CLI client drivers such as JDBC and ODBC drivers for Hive and Impala, and Beeline CLI client were hosted in a utility bucket on Amazon S3, making it difficult to access and download in private cloud environments. These client drivers are now packaged and distributed with CDW. You can download them from the Resources and Downloads tile.
Added support for unsecured LDAP for Impala
Earlier, CDW required you to use secured LDAP (LDAPS) for authenticating users accessing an Impala Virtual Warehouse. You can now use an unsecured LDAP server for authentication. CDW now uses the LDAP configurations that you have configured in the CDP Management Console.
“fe_service_threads” configuration is not copied from the base cluster to CDW
In CDP Private Cloud Data Services 1.5.1, the “fe_service_threads” configuration, which is used to specify the maximum number of concurrent client connections or threads allowed to serve client requests in Impala, was copied from the CDP Private Cloud base cluster to CDW along with its set value (typically 64). This degraded the performance. Starting with the 1.5.2 release, this configuration is no longer copied from the base cluster. You can set the value of the “fe_service_threads” configuration based on your requirements (recommended value is 96 or higher). See Configuring “fe_service_threads” in CDW Private Cloud.
Support for non-TLS-enabled HMS database
Earlier, you had to manually run a script for establishing unsecured connections to the Hive MetaStore (HMS) database (non-TLS-enabled). This process has been automated in the CDP Private Cloud Data Services 1.5.2 release. CDW automatically configures a secured or unsecured connection (with or without TLS) to the HMS database according to the configurations present on the CDP Private Cloud Base cluster.
High Availability (HA)-compatible Hive delegation token store automatically set for Database Catalogs
Storage for the Kerberos delegation token is defined by the hive.cluster.delegation.token.store.class property. CDW automatically sets the value of this property to org.apache.hadoop.hive.thrift.DBTokenStore for the Database Catalog, which is always created in the High Availability mode.