Glossary

Cloudera Data Platform documentation uses terminology related to enterprise data cloud and cloud computing.

CDP (Cloudera Data Platform) - CDP is a cloud service platform that consists of a number of services. It enables administrators to deploy CDP service resources and allows end users to process and analyze data by using these resources.

CDP CLI - Provides a command-line interface to access and manage CDP services and resources. The CDP CLI client can be downloaded from the CDP web console.

CDP web console - The web interface for accessing and manage CDP services and resources.

Cloudera Runtime - The open source software distribution within CDP that is maintained, supported, versioned, and packaged by Cloudera. Cloudera Runtime combines the best of CDP and HDP. Cloudera Runtime 7.0.0 is the first version.

Cluster - Also known as compute cluster, workload cluster, or Data Hub cluster. The cluster created by using the Data Hub service for running workloads. A cluster makes it possible to run one or more Cloudera Runtime components on some number of VMs and is associated with exactly one data lake.

Cluster definition - A reusable cluster template in JSON format that can be used for creating multiple Data Hub clusters with identical cloud provider settings. Data Hub includes a few built-in cluster definitions and allows you to save your own cluster definitions. A cluster definition is not synonymous with a blueprint, which primarily defines Cloudera Runtime services.

Cluster template - A reusable cluster template in JSON format that can be used for creating multiple Data Hub clusters with identical Cloudera Runtime settings. It primarily defines the list of Cloudera Runtime services included and how their components are distributed on different host groups. Data Hub includes a few built-in blueprints and allows you to save your own blueprints. A blueprint is not synonymous with a cluster definition, which primarily defines cloud provider settings.

Credential - Allows an administrator to configure access from CDP to a cloud provider account so that CDP can communicate with that account and provision resources within it. There is one credential per environment.

Data Catalog (service) - A CDP service used by data stewards to browse, search, and tag the content of a data lake, create and manage authorization policies, identify what data a user has accessed, and access the lineage of a particular data set.

Data Lake - A single logical store of data that provides a mechanism for storing, accessing, organizing, securing, and managing that data.

Data Lake cluster - A special cluster type that implements the Cloudera Runtime services (such as HMS, Ranger, Atlas, and so on) necessary to implement a data lake that further provides connectivity to a particular cloud storage service such as S3 or ADLS.

Data Hub (service) - A CDP service that administrators use to create and manage clusters powered by Cloudera Runtime.

Data Warehouse (service) - A CDP service for creating and managing self-service data warehouses for teams of data analysts.

Data warehouse - The output of the Data Warehouse service. Users access data warehouses via standard business intelligence tools such as JDBC or Tableau.

Environment- A logical environment defined with a specific virtual network and region on a customer’s cloud provider account. CDP service components such as Data Hub clusters, Data warehouses, and so on, run in an environment.

Image catalog - Defines a set of images that can be used for provisioning Data Hub cluster. Data Hub includes a built-in image catalog with a set of built-in base and prewarmed images and allows you to register your own image catalog.

Machine Learning (service) - A CDP service that administrators use to create and manage Machine Learning workspaces and that allows data scientists to do their machine learning.

Machine Learning workspace - The output of the Machine Learning service. Each workspace corresponds to a single cluster that can be accessed by end users.

Management Console (service) - A CDP service that allows an administrator to manage environments, users, and services; and download and configure the CLI.

Recipe - A reusable script that can be used to perform a specific task on a specific resource.

Replication manager (service) - A CDP service used by administrators and data stewards to move, copy, backup, replicate, and restore data in or between data lakes.

Service - A defined subset of CDP functionality that enables a CDP user to solve a specific problem related to their data lake (process, analyze, predict, and so on). Example services: Data Hub, Data Warehouse, Machine Learning.

Shared resources - A set of resources such as cloud credentials, recipes (custom scripts), and other that can be reused across multiple environments.

Workload Manager (service) - A CDP service used by database and workload administrators to troubleshoot, analyze and optimize workloads in order to improve performance and/or cost.