Cloudera Private Cloud Data Services 1.5.5 Release Summary

Platform Cloudera Control Plane

  • Cert manager integration (ECS only)
    • Cert-manager is an open-source tool for Kubernetes that automates the provisioning, management, and renewal of TLS certificates
    • Cert manager provides customers an option to not use wildcard certificates by default and instead use their certificate issuers to provision certificates that can be consumed by Cloudera Data Services
  • Istio support (ECS only)
    • Istio integration in Cloudera’s platform enables CDE’s ACL features as well as Cloudera AI Inferencing authorization capabilities
    • Control plane uses Istio in ambient mode
    • CAII and CDE are currently using Istio in sidecar mode
  • ECS Upgrade prechecks
    • Host Health Status - checks hosts in bad health or concerning health, stopped roles on the hosts
    • Host Prerequisites Inspections - EcsHostDnsInspection, Security Software Inspection
    • Control plane health check
    • Docker registry health check
  • Quota Management for multi-base cluster
    • Manage and allocate quotas:
      • setup, at time of the service or workload
      • sets the logical boundaries (quotas)
    • Enforce quotas:
      • applies the logical boundaries
      • runtime, queues new workloads that do not fit in quota
    • Enables multiple Base clusters to interact with a single control plane
  • Full restart vs rolling restart
    • All nodes restarted in parallel or quickly as opposed to a rolling restart which restarts one node at a time while maintaining etcd quorum and API uptime
  • Certifications include 7.1.9 SP1 CHF5, RHEL 9.5, OCP 4.17, RKE2 1.30

Cloudera Data Engineering

  • Security hardened Spark runtime images
    • Spark images have been updated with more secure components like Python 3.10 and 3.11
    • Admins have the option to choose the new Spark images from the virtual cluster creation screen.
  • New UMS roles for granular access controls (ECS deployments only)
    • Service Admin and VC Admin roles have been introduced to enable delegation of administrative tasks.
    • VC User and VC View only roles have been introduced to restrict access at the VC level and support new application-level artifact ACLs.
  • Application artifact ACLs (ECS deployments only)
    • Virtual cluster artifacts including jobs, job runs, resources, sessions, and repositories are now controlled by object level access control lists.
    • Support for two levels of sharing - full access and view-only.
    • All artifacts can be configured to be private upon creation.
  • Self-service user onboarding
    • Users that will execute applications on the cluster, can on-board their kerberos credentials using the self-service workflow. Removes the need for the previous administrative utility scripts.
  • Removal of wild card certificates
    • During service and VC creation, the system will automatically set up self-signed certs, which can later be updated with custom certificates by an administrator via UI/API/CLI.
    • Third party certificate manager Venafi is also supported (ECS deployment only)

Cloudera Data Warehouse

  • Hive and Impala query history
    • The Hive query history service provides a scalable solution for storing and analyzing historical Hive query data. It captures detailed information about completed queries, such as runtime, accessed tables, errors, and metadata, and stores it in an efficient Iceberg table format.
    • Cloudera Data Warehouse provides you the option to enable logging Impala queries on an existing Virtual Warehouse or while creating a new Impala Virtual Warehouse. The information for all completed Impala queries is stored in the sys.impala_query_log system table. Information about all actively running and recently completed Impala queries is stored in the sys.impala_query_live system table. Users with appropriate permissions can query this table using SQL to monitor and optimize the Impala engine.
  • Impala on Chainguard
    • Impala images were the hardest to move to Chainguard, but now all images shipped by Cloudera Data Warehouse are based on Chainguard, significantly reducing the CVE count.
  • Runtime features from 1.10.1
    • Impala:
      • Improved Cardinality Estimation for Aggregation Queries
      • Cleanup of host-level remote scratch dir on startup and exit
      • Graceful shutdown with query cancellation
      • Programmatic query termination
    • Iceberg
      • Cloudera support for Apache Iceberg version 1.5.2
      • Reading Iceberg Puffin statistics
      • Enhancements to Iceberg data compaction
      • Impala supports the MERGE INTO statement for Iceberg tables
    • Hue SQL AI Assistant

Cloudera AI

  • Cloudera AI Inference Service(CAII) [Technical Preview]
    • Production-grade serving environment for traditional, generative AI, and Large Language Models
  • AI Studios [Technical Preview]
    • Low code tools designed to simplify the development, customization, and deployment of generative AI solutions
  • Cloudera Copilot [Technical Preview]
    • AI-powered coding assistant designed for seamless integration within JupyterLab ML Runtimes
  • Model Hub [Technical Preview]
    • Catalog of top-performing LLM and generative AI models
  • Spark in CAI Improvements
    • Workbench-level Spark defaults
    • Spark Pushdown enabled at the Project level
  • User and Team Sync enabled by default
  • Cert-manager support
  • Multiple Docker registry account support