CDP Public Cloud: October 2020 Release Summary

Cloudera is pleased to announce the latest updates to CDP Public Cloud.

Highlights

  • CDP Data Visualization is now Generally Available
  • Streaming Analytics for DataHub is now Generally Available
  • The CDP Administration and CDP Data Governance courses in the Cloudera OnDemand Library have been greatly expanded
  • Fine Grained Access Control for Cloud Storage (aka Ranger Authorization) is now in Tech Preview for both Azure Datalake Storage and Amazon S3
  • SDX Upgrades are now available as a Tech Preview
  • A number of improvements to Control Plane authorization over environment resources have been introduced as a Tech Preview

New Or Updated Capabilities

Documentation

Cloudera Data Warehouse

  • The CDP cli now supports CDW

    • This will allow administrators to automate administrative tasks such as activation, catalog/warehouse provisioning, pause, resume, etc

CDP Data Visualization

  • CDP Data Visualization is now Generally Available

    • Data Visualization enables data engineers, data scientists, business analysts, and business users to quickly and easily explore data, collaborate, act on, and communicate explainable insights across the data lifecycle.

    • Fully integrated with CDW and CML, Data Visualization enables self-service, drag-and-drop visualizations from both enterprise data and predictive models natively in CDP, enabling:

      • Fast, intelligent reporting – Rapid, out-of-the-box dashboarding and application building with built-in visual recommendations
      • Intuitive workflows, secured by SDX – Easy-to-use visual UI for fast data exploration and instant sharing anywhere without moving data or creating silos
      • Integrated data lifecycle collaboration – Accelerate insight sharing with a consistent, integrated data visualization experience across all data and business teams

Cloudera Machine Learning

  • Public / Unauthenticated Access To Applications

    • For Applications served by CML that do not contain sensitive data nor require tight controls on who can access them, users can now deploy Applications that do not require authentication into CML

    • Note that Administrators can entirely disable this functionality for an ML Workspace if needed

  • Pass User Information To Applications

    • Via the REMOTE_USER header, deployed Applications can be developed to customize the content being presented specifically to the logged in user
  • Updated Sessions Page

    • A new table design has been applied to the Sessions page, making it easier to take actions such as filtering down to running sessions and performing bulk deletion of old Sessions
  • Scale to 0 Workers in AWS

    • CML Worker Nodes in AWS deployments are now able to fully scale down to 0, where previously the S2I builder (used in building images for Experiments, Models, etc) was preventing this from occurring
  • Kubernetes Upgrades via Management Console

    • Workspace Upgrades executed from the Management Console now include the ability to upgrade the K8s version of the ML Workspace; this is done as required by the ML Workspace version and does not require any additional steps from the Administrator
  • Integration with RAZ for ADLS-Gen2

    • With the introduction of RAZ in CDP-7.2.1, access to ADLS-Gen2 from CML can be authorized using Ranger policies
  • Subnet Details on ML Workspace Details Page

    • After creation of an ML Workspace, Administrators can now view the subnets in use by the workspace on the ML Workspace Details Page

Cloudera Data Hub

  • Streaming Analytics for Data Hub GA

    • Users can now launch fully secured streaming analytics clusters to run Apache Flink jobs on Public Cloud

    • Run Streaming SQL queries against data from Streams Messaging (Kafka) clusters in Data Hub.

  • Consistent Hostnames for Data Hub clusters

    • All hostnames are deterministic and consistent when Data Hub clusters of the same name are deleted and recreated, providing stable endpoints for applications
  • [Tech Preview] DataHub Autoscaling is now available on Azure

    • Data Hub clusters running YARN can autoscale compute resources based on schedules or load. This feature is now available under entitlement for both AWS and Azure

SDX

  • [Tech Preview] Improvements on Fine-grained Access Control for ADLS cloud storage via the Ranger Authorization Service (RAZ)

    • RAZ for ADLS is now available for tech preview in all Azure regions. The Microsoft Azure team has deployed a prerequisite service to all regions across the globe. Previously tech previews were limited to Azure environments hosted in the East US, West US, Canada Central, Canada East, Hong Kong and Singapore regions.

    • RAZ for ALDS environments can now be created via the UX for the Registration wizard. Users no longer need to use the CLI and use custom image catalogs.

  • [Tech Preview] External interoperation simplified by RAZ for ADLS's integration with Azure Active Directory's Identities

    • Simplified security for enabling CDP to consume and publish data to services outside of CDP.

    • Customers using Azure Active Directory as their IdP can enable this to have CDP automatically respect ADLS file ownership and confer ownership to files that are respected by Azure services outside of CDP. For example, if a user writes a file to ADLS outside of CDP, that user could automatically access the file within CDP. Similarly a file written to ADLS in CDP can be consumed by the corresponding user outside of CDP. Without this feature enabled, all CDP files are owned by a single superuser like identity and would require extra group mapping or ownership changes to enable access.

  • [Tech Preview] Fine-grained Access control for S3 cloud storage via RAZ.

    • Reduce security risk, improve compliance and eliminate unintended data deletion with the addition of the Ranger Authentication Service (RAZ) for AWS S3. CDP on AWS users gain fine-grained access control policy and auditing capabilities on their file system accesses. Rather than handling access controls to cloud storage at the group level, individual users now can have individual home directories that are protected from other users. This unlocks a host of benefits:

      • Spark users can now work against files in their own directories
      • Classification tags on file system directories can be propagated through data lineage
      • More complete migration for legacy HDP users with HDFS Ranger policies directly ported to CDP in AWS.
      • Legacy CDH users now have the comparable file access audit capabilities to those Cloudera Navigator provided.
    • This Tech Preview is only available for Data Hub clusters

  • [Tech Preview] On Azure, the external postgres database can be accessed via a Private Link Endpoint

    • Reduce risk, simplify the network architecture and eliminate data exposure to the public internet.
  • [Tech Preview] Data Lake upgrades are now available

    • Environment upgrades from Cloudera Runtime 7.1 to 7.2 are now supported, as well as patch upgrades of both Cloudera Runtime and Cloudera manager, and OS-only upgrades

Cloudera Data Catalog

  • Streamlined cross-environment search.

    • When users perform a search in the context of a specific environment, the # of results for the corresponding search in other environments is now presented.

    • This enables users to find data in other environments and easily navigate to them without having to change environment and re-perform the search.

Management Console & Control Plane

  • CLI/API updates now include a changelog

    • A new version of the Management Console API & cdpcli is released, all changes are captured in the changelog
  • [Tech Preview] Expanded Set of Control Plane Roles for Improved Authorization

    • A number of new roles have been introduced to provide finer grained authorization over management console resources such as DataHub clusters, cloud credentials, cluster templates, cluster definitions, image catalogs and recipes

    • This will reduce the dependency on PowerUser role with the introduction of:

      • Account role EnvironmentCreator permits creation of environments and shared resources (cloud credentials, cluster definitions, recipes, image catalog and cluster templates)
      • Environment role DataHubCreator to allow creation of data hub clusters
    • Authorization at an individual data hub level is possible with DataHubAdmin and DataHubUser roles

    • An individual or group can be assigned as Owner of a CDP resource

Full release documentation including release notes for each service is available here (Service Name → Release Notes → What’s New).