CDP Public Cloud glossary

CDP Cloud documentation uses terminology related to enterprise data cloud and cloud computing.

CDP (Cloudera Data Platform) Public Cloud - CDP Public Cloud is a cloud service platform that consists of a number of services. It enables administrators to deploy CDP service resources and allows end users to process and analyze data by using these resources.

CDP CLI - Provides a command-line interface to access and manage CDP services and resources.

CDP web console - The web interface for accessing and manage CDP services and resources.

Cloudera Observability (data service) - A CDP data service used by database and workload administrators to troubleshoot, analyze and optimize workloads in order to improve performance and/or cost.

Cloudera Runtime - The open source software distribution within CDP that is maintained, supported, versioned, and packaged by Cloudera. Cloudera Runtime combines the best of CDP and HDP. Cloudera Runtime 7.0.0 is the first version.

Cluster - Also known as compute cluster, workload cluster, or Data Hub cluster. The cluster created by using the Data Hub service for running workloads. A cluster makes it possible to run one or more Cloudera Runtime components on some number of VMs and is associated with exactly one data lake.

Cluster definition - A reusable cluster template in JSON format that can be used for creating multiple Data Hub clusters with identical cloud provider settings. Data Hub includes a few built-in cluster definitions and allows you to save your own cluster definitions. A cluster definition is not synonymous with a blueprint, which primarily defines Cloudera Runtime services.

Cluster Repair - A feature in CDP Management Console that enables you to select specific nodes within a node group for a repair operation. This feature reduces the downtime incurred when only a subset of the nodes are unhealthy.

Cluster template - A reusable cluster template in JSON format that can be used for creating multiple Data Hub clusters with identical Cloudera Runtime settings. It primarily defines the list of Cloudera Runtime services included and how their components are distributed on different host groups. Data Hub includes a few built-in blueprints and allows you to save your own blueprints. A blueprint is not synonymous with a cluster definition, which primarily defines cloud provider settings.

Control Plane - A Cloudera operated cloud service that includes services like Management Console, Cloudera Observability, Replication Manager and Data Catalog. These services interact with your account in Amazon Web Services (AWS), Microsoft Azure, and Google Cloud to provision and manage compute infrastructure that you can use to manage the lifecycle of data stored in your cloud account. In addition, the Control Plane can interface with your on-premises and Private Cloud infrastructure to support hybrid cloud deployments.

Credential - Allows an administrator to configure access from CDP to a cloud provider account so that CDP can communicate with that account and provision resources within it. There is one credential per environment.

Data Catalog (data service) - A CDP data service used by data stewards to browse, search, and tag the content of a data lake, create and manage authorization policies, identify what data a user has accessed, and access the lineage of a particular data set.

DataFlow (data service) - A CDP data service that enables you to import and deploy your data flow definitions efficiently, securely, and at scale.

Data Lake - A single logical store of data that provides a mechanism for storing, accessing, organizing, securing, and managing that data.

Data Lake cluster - A special cluster type that implements the Cloudera Runtime services (such as HMS, Ranger, Atlas, and so on) necessary to implement a data lake that further provides connectivity to a particular cloud storage service such as S3 or ADLS.

Data Hub (service) - A CDP service that administrators use to create and manage clusters powered by Cloudera Runtime.

Data Warehouse (data service) - A CDP data service for creating and managing self-service data warehouses for teams of data analysts.

Data warehouse - The output of the Data Warehouse service. Users access data warehouses via standard business intelligence tools such as JDBC or Tableau

Environment - A logical environment defined with a specific virtual network and region in a customer’s cloud provider account. One can refer to a "CDP environment" once a cloud provider virtual network, cloud storage, and other cloud provider artifacts present in a customer's AWS, Azure, or GCP account have been registered in CDP. CDP service components such as Data Hub clusters, Data Warehouse clusters, and so on, run in an environment.

Image catalog - Defines a set of images that can be used for provisioning Data Hub cluster. Data Hub includes a built-in image catalog with a set of built-in base and prewarmed images and allows you to register your own image catalog.

Machine Learning (data service) - A CDP data service that administrators use to create and manage Machine Learning workspaces and that allows data scientists to do their machine learning.

Machine Learning workspace - The output of the Machine Learning service. Each workspace corresponds to a single cluster that can be accessed by end users.

Management Console (data service) - A CDP data service that allows an administrator to manage environments, users, and services; and download and configure the CLI.

Operational Database (data service) - A CDP data service that administrators use to create and manage scale-out, autonomous database powered by Apache HBase and Apache Phoenix.

Recipe - A reusable script that can be used to perform a specific task on a specific resource.

Replication Manager (data service) - A CDP data service used by administrators and data stewards to move, copy, backup, replicate, and restore data in or between data lakes.

Service - A defined subset of CDP functionality that enables a CDP user to solve a specific problem related to their data lake (process, analyze, predict, and so on). Example services: Data Hub, Data Warehouse, Machine Learning.

Shared resources - A set of resources such as cloud credentials, recipes (custom scripts), and other that can be reused across multiple environments.