August 30, 2021
This release (1.11) of the Cloudera Data Engineering (CDE) service on CDP Public Cloud introduces the new features and improvements that are described in this topic.
GA support for virtual clusters powered by Apache Spark 3
- Support for virtual clusters powered by Apache Spark 3 is no longer a Technical Preview feature, and is now generally available (GA).
- The following functionalities are not currently supported:
- Deep analysis (visual profiler)
- HWC - that is, Hive managed ACID tables (Direct Reader & JDBC mode)
- Phoenix Connector
- SparkR
- Kudu
[Technical Preview] Fully private AKS cluster set up
- Fully private AKS clusters are now supported, for customers who want to restrict resources from being exposed via public IP addresses. This allows securing the Kubernetes cluster even more, an AKS API server can be created with a private IP address which is only accessible to the resources which are running inside of the Azure virtual network (VNet).
- A private AKS is deployed within customers' network and leverages CCMv2/Proxy for accessing the K8s APIs.
- Cloudera recommends using one single resource group per environment. You can accomplish this by selecting a (pre-created) resource group during CDP environment creation.
Gang scheduling enabled by default
- YuniKorn Gang scheduling policy is now enabled by default within CDE.
- For more information on Gang scheduling, see the Spark on Kubernetes – Gang Scheduling with YuniKorn Cloudera Blog post.
[Technical Preview] User-specified IAM roles
- CDE job pods can now run with a user-specified IAM role with the role credentials automatically supplied as instance credentials. This allows transparent usage of cloud SDKs or any code making use of the instance credentials provider. User roles are secured and allocated through the CDP environment IDBroker mappings.
- This feature is available as a Technical Preview. Contact your Cloudera account representative to enable access to this feature.
Spark Analysis disabled by default
-
Metric collection from Spark jobs is now disabled by default to provide the most optimal performance.
-
During development and testing, you can turn on additional Spark profiling:
- On the CDE UI:
After creating the job, go to its Configuration tab and toggle the Spark Analysis option.
For more information, see Managing jobs in Cloudera Data Engineering. - From CLI/API:
Set the following configuration parameter during job creation:
dex.safariEnabled=true
For more information, see Managing Cloudera Data Engineering jobs using the CLI and Creating a Cloudera Data Engineering job using the API respectively.
- On the CDE UI: