February 09, 2022

This release (1.14) of the Cloudera Data Engineering (CDE) service on CDP Public Cloud introduces the following changes.

Improved handling of job resources to reduce EFS utilization

  • Recursive copying of frequently used and large file resources can result in very high I/O throughput and can exhaust cloud storage burst credits, leading to poor performance. To avoid excessive file copying, CDE now uses hard linking in AWS by default.

[Technical Preview] Apache Iceberg support

  • Apache Iceberg tables are now supported with Spark 3 virtual clusters on AWS. Use tables at petabyte scale without impacting query planning, while benefiting from efficient metadata management, snapshotting, and time-travel.
  • Run multi-analytic workloads by accessing those same tables in Cloudera Data Warehouse (CDW) with Hive and Impala for BI and SQL analytics (Expected in an upcoming CDW release).

[Technical Preview] Remote Shuffle Service

  • You can now store Spark shuffle data on remote servers. This improves resilience in case of executor loss.
  • This feature is available as a Technical Preview. Contact your Cloudera account representative to enable access to this feature.

Unified diagnostic bundle

  • A single click now generates one unified bundle containing both service logs and summary status.
  • The bundles are stored securely in the object storage of the environment.
  • A historical list of previously generated bundles are available for access.

Guardrails to prevent submitting jobs that do not fit resource capacity

  • CDE now automatically prevents execution of jobs that do not fit on the available resources.
  • CDE takes into account Kubernetes and system reserved resources, daemonset utilized resources, and Spark overhead factors.
  • The API returns an error with run failed to start: requested [***TYPE AND AMOUNT OF RESOURCE***] is more than [***THE MAXIMUM AMOUNT OF AVAILABLE RESOURCES OF THAT TYPE***] allocatable per cluster node
  • You can either reduce the Spark executor and driver CPU and/or memory requirements, or deploy on a larger cluster.

Notification email configuration can now be verified

When configuring the optional email alerts feature [Technical Preview] during virtual cluster creation, you can now verify the SMTP settings before creating the virtual cluster.

Streamlined resource creation and re-use during job creation

You can now create a resource on the fly when creating a job. Alternatively, you can select from a list of existing resources, if any, to upload your application or DAG file. This promotes re-usability of project artifacts across jobs.

Kubernetes update

CDE now supports K8s 1.21.