February 09, 2022
This release (1.14) of the Cloudera Data Engineering (CDE) service on CDP Public Cloud introduces the following changes.
Improved handling of job resources to reduce EFS utilization
- Recursive copying of frequently used and large file resources can result in very high I/O throughput and can exhaust cloud storage burst credits, leading to poor performance. To avoid excessive file copying, CDE now uses hard linking in AWS by default.
[Technical Preview] Apache Iceberg support
- Apache Iceberg tables are now supported with Spark 3 virtual clusters on AWS. Use tables at petabyte scale without impacting query planning, while benefiting from efficient metadata management, snapshotting, and time-travel.
- Run multi-analytic workloads by accessing those same tables in Cloudera Data Warehouse (CDW) with Hive and Impala for BI and SQL analytics (Expected in an upcoming CDW release).
[Technical Preview] Remote Shuffle Service
- You can now store Spark shuffle data on remote servers. This improves resilience in case of executor loss.
- This feature is available as a Technical Preview. Contact your Cloudera account representative to enable access to this feature.
Unified diagnostic bundle
- A single click now generates one unified bundle containing both service logs and summary status.
- The bundles are stored securely in the object storage of the environment.
- A historical list of previously generated bundles are available for access.
Guardrails to prevent submitting jobs that do not fit resource capacity
- CDE now automatically prevents execution of jobs that do not fit on the available resources.
- CDE takes into account Kubernetes and system reserved resources, daemonset utilized resources, and Spark overhead factors.
- The API returns an error with run failed to start: requested [***TYPE AND AMOUNT OF RESOURCE***] is more than [***THE MAXIMUM AMOUNT OF AVAILABLE RESOURCES OF THAT TYPE***] allocatable per cluster node
- You can either reduce the Spark executor and driver CPU and/or memory requirements, or deploy on a larger cluster.
Notification email configuration can now be verified
When configuring the optional email alerts feature [Technical Preview] during virtual cluster creation, you can now verify the SMTP settings before creating the virtual cluster.
Streamlined resource creation and re-use during job creation
You can now create a resource on the fly when creating a job. Alternatively, you can select from a list of existing resources, if any, to upload your application or DAG file. This promotes re-usability of project artifacts across jobs.
Kubernetes update
CDE now supports K8s 1.21.