CDP Public Cloud: September 2021 Release Summary
DataFlow
The following new features and enhancements are available:
- CLI support has been added for creating flow deployments and flow definition management. DataFlow CLI is available in the Beta CDP CLI.
- Added the S3 to S3 ReadyFlow leveraging S3 bucket notifications.
- DataFlow now supports Cloudera RAZ (fine-grained access control) for object store access.
- The Dashboard, Catalog, and Environments pages now persist search queries and applied filters for a better user experience.
- When deploying a flow definition, you can now select whether the NiFi flow should be started automatically.
The following documentation updates were made:
- ReadyFlow documentation
- CLI documentation
- Updated ReadyFlows for RAZ integration
- NiFi Endpoints Support Matrix
Data Warehouse
Retaining PostgreSQL backups in the cluster
When you create a Cloudera Data Warehouse cluster using the CDP CLI create-cluster command, any PostgreSQL backup retention period you set on your Cloud provider side is observed by CDP. For example, if you configure backupRetentionDays in Azure or BackupRetentionPeriod in AWS, and then create a cluster using the CDP CLI, the cluster will retain the PostgreSQL backups according to your configuration.
Operational Database
[OS upgrade] The upgrade-database command supports a new option to upgrade only the operating system
COD provides a new option –os-upgrade-only for the upgrade-database command, which you can use to upgrade only the operating system. Running the upgrade-database command with this option does not have an effect on the CDP runtime version running on the given cluster.
[Multi-AZ] - Need the ability to deploy DataHub/OpDB across multiple AZs to ensure HA in COD
COD introduces a technical preview version of Multi-AZ deployment capability with limited functionality.
Management Console
New authorization model
CDP introduces a new authorization model. The following table summarizes new, changed, and deprecated roles. The roles that are not mentioned in this table are unchanged.
Account roles
- New account-level role EnvironmentCreator can be assigned on the account level.
- The EnvironmentAdmin and EnvironmentUser roles have been deprecated in June 2020 and have been removed from the official documentation.
Resource roles
- New environment resource roles DataSteward and DataHubCreator can be assigned on the scope of a specific environment.
- New Data Hub resource role DataHubAdmin (Technical Preview) can be assigned on the scope of a specific Data Hub.
- New shared resource role SharedResouceUser can be assigned on the scope of a specific shared resource (cluster template, credential, image catalog, or recipe). However, note that unlike other shared resources, proxies can only be managed by a PowerUser.
- New Owner resource role, applicable to environments, Data Hubs, shared resources, and classic clusters, grants all permissions required to manage the resource in CDP including the ability to delete it, but does not grant any cluster-level access. The user creating the resource automatically gets the Owner role on that resource.
Steps for assigning roles
- The steps for assigning account roles and managing access to environments are unchanged.
- The steps for managing access to Data Hubs, shared resources (cluster templates, credentials, image catalogs, and recipes), and classic clusters are similar to the steps for managing access to environments: You can use the “Manage access” option from the resource details page.
Updated documentation
- For updated information about all built-in roles in CDP, refer to Understanding account-level roles and resource roles.
- For updated instructions for how to manage access to resources, refer to Assigning a resource role to a user and Assigning a resource role to a group.
- Other new and updated documentation:
Dots are now supported in group names
The list of supported characters in group names was extended to include dots. See updated documentation:
Improved AWS cloud storage setup documentation
AWS cloud storage setup documentation has been improved to include exact steps for creating the required S3 bucket, IAM policies, and IAM roles. See Minimal setup for cloud storage and Onboarding CDP users and groups for cloud storage.
Runtime 7.2.11
Runtime 7.2.11 is now available and can be used for registering an environment with a 7.2.11 Data Lake and creating Data Hub clusters. See Cloudera Runtime.
Specifying multiple existing AWS security groups
When using your existing security groups for registering an AWS environment in CDP via CDP CLI, you can provide a comma-separated list of multiple security groups for both Knox (securityGroupIdForKnox) and Default (defaultSecurityGroupId). This is a CLI-only feature.
Specifying multiple GCP shared subnets
When using an existing shared VPC for registering a GCP environment in CDP via CDP web interface or CLI, you can specify multiple shared subnets.
Support for Bahrain (me-south-1) AWS region
Registering an environment and provisioning Data Hubs is now supported in the Bahrain (me-south-1) AWS region. See Supported AWS regions.
Updated outbound network access destinations
If you are using Machine Learning, Data Engineering, or DataFlow CDP services and have restricted egress access, starting on September 7, 2021, you need to add the following new endpoints to your egress rules:
*.s3.{eu-west-1, us-west-2, ap-southeast-1}.amazonaws.com
s3-r-w.{eu-west-1, us-west-2, ap-southeast-1}.amazonaws.com
*.execute-api.{eu-west-1, us-west-2, ap-southeast-1}.amazonaws.com
The region selected should be the region that is geographically closest to where the environment is deployed.
Customers operating in outbound restricted networks will be unable to download docker images, which will impact creating new clusters. Existing environments deployed in outbound restricted networks may experience operational issues, including limited ability to start, scale and repair the experience clusters.
For more information, see:
- Outbound network access destinations for AWS
- Outbound network access destinations for Azure
- Outbound network access destinations for GCP
New GCP permissions for provisioning credential
The list of permissions for the provisioning credential’s service account has been updated to include new permission required for load balancing between HA components of the Data Lake. If you are running CDP in GCP, you should update the provisioning credential’s service account to include either the Compute Load Balancer Admin
(roles/compute.loadBalancerAdmin
) IAM role or the following granular permissions:
compute.addresses.create
compute.addresses.delete
compute.addresses.get
compute.addresses.use
compute.instanceGroups.create
compute.instanceGroups.delete
compute.instanceGroups.get
compute.instanceGroups.list
compute.instanceGroups.update
compute.instanceGroups.use
compute.forwardingRules.create
compute.forwardingRules.delete
compute.forwardingRules.get
compute.forwardingRules.list
compute.forwardingRules.setLabels
compute.forwardingRules.update
compute.forwardingRules.use
compute.regionBackendServices.create
compute.regionBackendServices.delete
compute.regionBackendServices.get
compute.regionBackendServices.list
compute.regionBackendServices.update
compute.regionBackendServices.use
compute.regionHealthChecks.create
compute.regionHealthChecks.delete
compute.regionHealthChecks.get
compute.regionHealthChecks.list
compute.regionHealthChecks.update
compute.regionHealthChecks.use
See updated Permissions for the provisioning credential’s service account.
Operatonal Database
[OS upgrade] New upgrade-database
option to upgrade only the operating system
COD provides a new option --os-upgrade-only
for the upgrade-database
command which can be used to upgrade only the operating system. Running the upgrade-database
command with this option does not have an effect on the CDP Runtime version running on the given cluster.
[Multi-AZ] Deploying DataHub/OpDB across multiple AZs to ensure HA in COD
COD introduces a preview version of multi-AZ deployment capability with limited functionality.
Replication Manager
Enhancements to HBase replication policies
The following enhancements are available for an HBase replication policy:
- You do not need to create a schema similar to the source cluster on the destination cluster.
- Replication Manager performs the first-time setup configuration steps which includes HBase service restart on both the clusters automatically.
- Optionally, you can also create or use an existing HBase replication machine user during policy creation and then validate the existing username with the UMS. The username and password is automatically synchronized to the destination cluster’s environment (and to the source’s as well if the source is Cloudera Operational Database (COD)).
Other
The cdpctl
cloud resource validation tool for Azure (Preview)
The cdpctl
cloud resource validation tool is now available for Azure, allowing Azure cloud administrators to verify the compatibility of existing cloud setup with CDP prior to handing over to the CDP administrator. See cdpctl).